draft-ietf-netvc-testing-03.txt   draft-ietf-netvc-testing-04.txt 
Network Working Group T. Daede Network Working Group T. Daede
Internet-Draft Mozilla Internet-Draft Mozilla
Intended status: Informational A. Norkin Intended status: Informational A. Norkin
Expires: January 09, 2017 Netflix Expires: May 4, 2017 Netflix
I. Brailovskiy I. Brailovskiy
Amazon Lab126 Amazon Lab126
July 08, 2016 October 31, 2016
Video Codec Testing and Quality Measurement Video Codec Testing and Quality Measurement
draft-ietf-netvc-testing-03 draft-ietf-netvc-testing-04
Abstract Abstract
This document describes guidelines and procedures for evaluating a This document describes guidelines and procedures for evaluating a
video codec. This covers subjective and objective tests, test video codec. This covers subjective and objective tests, test
conditions, and materials used for the test. conditions, and materials used for the test.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
skipping to change at page 1, line 35 skipping to change at page 1, line 35
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 09, 2017. This Internet-Draft will expire on May 4, 2017.
Copyright Notice Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Subjective quality tests . . . . . . . . . . . . . . . . . . 3 2. Subjective quality tests . . . . . . . . . . . . . . . . . . 3
2.1. Still Image Pair Comparison . . . . . . . . . . . . . . . 3 2.1. Still Image Pair Comparison . . . . . . . . . . . . . . . 3
2.2. Video Pair Comparison . . . . . . . . . . . . . . . . . . 3 2.2. Video Pair Comparison . . . . . . . . . . . . . . . . . . 3
2.3. Subjective viewing test . . . . . . . . . . . . . . . . . 4 2.3. Subjective viewing test . . . . . . . . . . . . . . . . . 4
3. Objective Metrics . . . . . . . . . . . . . . . . . . . . . . 4 3. Objective Metrics . . . . . . . . . . . . . . . . . . . . . . 4
3.1. Overall PSNR . . . . . . . . . . . . . . . . . . . . . . 4 3.1. Overall PSNR . . . . . . . . . . . . . . . . . . . . . . 4
3.2. Frame-averaged PSNR . . . . . . . . . . . . . . . . . . . 5 3.2. Frame-averaged PSNR . . . . . . . . . . . . . . . . . . . 5
3.3. PSNR-HVS-M . . . . . . . . . . . . . . . . . . . . . . . 5 3.3. PSNR-HVS-M . . . . . . . . . . . . . . . . . . . . . . . 5
3.4. SSIM . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.4. SSIM . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.5. Multi-Scale SSIM . . . . . . . . . . . . . . . . . . . . 5 3.5. Multi-Scale SSIM . . . . . . . . . . . . . . . . . . . . 5
3.6. Fast Multi-Scale SSIM . . . . . . . . . . . . . . . . . . 6 3.6. CIEDE2000 . . . . . . . . . . . . . . . . . . . . . . . . 5
3.7. CIEDE2000 . . . . . . . . . . . . . . . . . . . . . . . . 6 3.7. VMAF . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.8. VMAF . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4. Comparing and Interpreting Results . . . . . . . . . . . . . 6 4. Comparing and Interpreting Results . . . . . . . . . . . . . 6
4.1. Graphing . . . . . . . . . . . . . . . . . . . . . . . . 6 4.1. Graphing . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2. BD-Rate . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.2. BD-Rate . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3. Ranges . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.3. Ranges . . . . . . . . . . . . . . . . . . . . . . . . . 7
5. Test Sequences . . . . . . . . . . . . . . . . . . . . . . . 7 5. Test Sequences . . . . . . . . . . . . . . . . . . . . . . . 7
5.1. Sources . . . . . . . . . . . . . . . . . . . . . . . . . 7 5.1. Sources . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.2. Test Sets . . . . . . . . . . . . . . . . . . . . . . . . 8 5.2. Test Sets . . . . . . . . . . . . . . . . . . . . . . . . 8
5.2.1. regression-1 . . . . . . . . . . . . . . . . . . . . 8 5.2.1. regression-1 . . . . . . . . . . . . . . . . . . . . 8
5.2.2. objective-1 . . . . . . . . . . . . . . . . . . . . . 8 5.2.2. objective-1 . . . . . . . . . . . . . . . . . . . . . 8
5.2.3. objective-1-fast . . . . . . . . . . . . . . . . . . 11 5.2.3. objective-1-fast . . . . . . . . . . . . . . . . . . 11
5.3. Operating Points . . . . . . . . . . . . . . . . . . . . 13 5.3. Operating Points . . . . . . . . . . . . . . . . . . . . 13
5.3.1. Common settings . . . . . . . . . . . . . . . . . . . 13 5.3.1. Common settings . . . . . . . . . . . . . . . . . . . 13
5.3.2. High Latency CQP . . . . . . . . . . . . . . . . . . 13 5.3.2. High Latency CQP . . . . . . . . . . . . . . . . . . 13
5.3.3. Low Latency CQP . . . . . . . . . . . . . . . . . . . 14 5.3.3. Low Latency CQP . . . . . . . . . . . . . . . . . . . 13
5.3.4. Unconstrained High Latency . . . . . . . . . . . . . 14 5.3.4. Unconstrained High Latency . . . . . . . . . . . . . 14
5.3.5. Unconstrained Low Latency . . . . . . . . . . . . . . 14 5.3.5. Unconstrained Low Latency . . . . . . . . . . . . . . 14
6. Automation . . . . . . . . . . . . . . . . . . . . . . . . . 15 6. Automation . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.1. Regression tests . . . . . . . . . . . . . . . . . . . . 15 6.1. Regression tests . . . . . . . . . . . . . . . . . . . . 15
6.2. Objective performance tests . . . . . . . . . . . . . . . 15 6.2. Objective performance tests . . . . . . . . . . . . . . . 15
6.3. Periodic tests . . . . . . . . . . . . . . . . . . . . . 16 6.3. Periodic tests . . . . . . . . . . . . . . . . . . . . . 15
7. Informative References . . . . . . . . . . . . . . . . . . . 16 7. Informative References . . . . . . . . . . . . . . . . . . . 16
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17
1. Introduction 1. Introduction
When developing a video codec, changes and additions to the codec When developing a video codec, changes and additions to the codec
need to be decided based on their performance tradeoffs. In need to be decided based on their performance tradeoffs. In
addition, measurements are needed to determine when the codec has met addition, measurements are needed to determine when the codec has met
its performance goals. This document specifies how the tests are to its performance goals. This document specifies how the tests are to
be carried about to ensure valid comparisons when evaluating changes be carried about to ensure valid comparisons when evaluating changes
skipping to change at page 5, line 39 skipping to change at page 5, line 35
scores. scores.
3.4. SSIM 3.4. SSIM
SSIM (Structural Similarity Image Metric) is a still image quality SSIM (Structural Similarity Image Metric) is a still image quality
metric introduced in 2004 [SSIM]. It computes a score for each metric introduced in 2004 [SSIM]. It computes a score for each
individual pixel, using a window of neighboring pixels. These scores individual pixel, using a window of neighboring pixels. These scores
can then be averaged to produce a global score for the entire image. can then be averaged to produce a global score for the entire image.
The original paper produces scores ranging between 0 and 1. The original paper produces scores ranging between 0 and 1.
For the metric to appear more linear on BD-rate curves, the score is To linearize the metric for BD-Rate computation, the score is
converted into a nonlinear decibel scale: converted into a nonlinear decibel scale:
-10 * log10 (1 - SSIM) -10 * log10 (1 - SSIM)
3.5. Multi-Scale SSIM 3.5. Multi-Scale SSIM
Multi-Scale SSIM is SSIM extended to multiple window sizes [MSSSIM]. Multi-Scale SSIM is SSIM extended to multiple window sizes [MSSSIM].
The metric score is converted to decibels in the same way as SSIM.
3.6. Fast Multi-Scale SSIM 3.6. CIEDE2000
Fast MS-SSIM is a modified implementation of MS-SSIM which operates
on a limited number of scales and with modified weights [FASTSSIM].
The final score is converted to decibels in the same manner as SSIM.
3.7. CIEDE2000
CIEDE2000 is a metric based on CIEDE color distances [CIEDE2000]. It CIEDE2000 is a metric based on CIEDE color distances [CIEDE2000]. It
generates a single score taking into account all three chroma planes. generates a single score taking into account all three chroma planes.
It does not take into consideration any structural similarity or It does not take into consideration any structural similarity or
other psychovisual effects. other psychovisual effects.
3.8. VMAF 3.7. VMAF
Video Multi-method Assessment Fusion (VMAF) is a full-reference Video Multi-method Assessment Fusion (VMAF) is a full-reference
perceptual video quality metric that aims to approximate human perceptual video quality metric that aims to approximate human
perception of video quality [VMAF]. This metric is focused on perception of video quality [VMAF]. This metric is focused on
quality degradation due compression and rescaling. VMAF estimates quality degradation due compression and rescaling. VMAF estimates
the perceived quality score by computing scores from multiple quality the perceived quality score by computing scores from multiple quality
assessment algorithms, and fusing them using a support vector machine assessment algorithms, and fusing them using a support vector machine
(SVM). Currently, three image fidelity metrics and one temporal (SVM). Currently, three image fidelity metrics and one temporal
signal have been chosen as features to the SVM, namely Anti-noise SNR signal have been chosen as features to the SVM, namely Anti-noise SNR
(ANSNR), Detail Loss Measure (DLM), Visual Information Fidelity (ANSNR), Detail Loss Measure (DLM), Visual Information Fidelity
(VIF), and the mean co-located pixel difference of a frame with (VIF), and the mean co-located pixel difference of a frame with
respect to the previous frame. respect to the previous frame.
The quality score from VMAF is used directly to calculate BD-Rate,
without any conversions.
4. Comparing and Interpreting Results 4. Comparing and Interpreting Results
4.1. Graphing 4.1. Graphing
When displayed on a graph, bitrate is shown on the X axis, and the When displayed on a graph, bitrate is shown on the X axis, and the
quality metric is on the Y axis. For publication, the X axis should quality metric is on the Y axis. For publication, the X axis should
be linear. The Y axis metric should be plotted in decibels. If the be linear. The Y axis metric should be plotted in decibels. If the
quality metric does not natively report quality in decibels, it quality metric does not natively report quality in decibels, it
should be converted as described in the previous section. should be converted as described in the previous section.
skipping to change at page 7, line 39 skipping to change at page 7, line 30
o The log-rate is numerically integrated over the metric range for o The log-rate is numerically integrated over the metric range for
each curve, using at least 1000 samples and trapezoidal each curve, using at least 1000 samples and trapezoidal
integration. integration.
o The resulting integrated log-rates are converted back into linear o The resulting integrated log-rates are converted back into linear
rate, and then the percent difference is calculated from the rate, and then the percent difference is calculated from the
reference to the test codec. reference to the test codec.
4.3. Ranges 4.3. Ranges
For all tests described in this document, the anchor codec used for For individual feature changes in libaom or libvpx, the overlap BD-
ranges is libvpx 1.5.0 run with VP9 and High Latency CQP settings. Rate method with quantizers 20, 32, 43, and 55 must be used.
The quality range used is that achieved between cq-level 20 and 55.
For testing changes to libvpx or libaom, the anchor does not need to For the final evaluation described in [I-D.ietf-netvc-requirements],
be used. the quantizers used are 20, 24, 28, 32, 36, 39, 43, 47, 51, and 55.
5. Test Sequences 5. Test Sequences
5.1. Sources 5.1. Sources
Lossless test clips are preferred for most tests, because the Lossless test clips are preferred for most tests, because the
structure of compression artifacts in already-compressed clips may structure of compression artifacts in already-compressed clips may
introduce extra noise in the test results. However, a large amount introduce extra noise in the test results. However, a large amount
of content on the internet needs to be recompressed at least once, so of content on the internet needs to be recompressed at least once, so
some sources of this nature are useful. The encoder should run at some sources of this nature are useful. The encoder should run at
skipping to change at page 16, line 19 skipping to change at page 16, line 7
substituted. substituted.
6.3. Periodic tests 6.3. Periodic tests
Periodic tests are run on a wide range of bitrates in order to gauge Periodic tests are run on a wide range of bitrates in order to gauge
progress over time, as well as detect potential regressions missed by progress over time, as well as detect potential regressions missed by
other tests. other tests.
7. Informative References 7. Informative References
[AWCY] Xiph.Org, "Are We Compressed Yet?", 2016, <https:// [AWCY] Xiph.Org, "Are We Compressed Yet?", 2016,
arewecompressedyet.com/>. <https://arewecompressedyet.com/>.
[BT500] ITU-R, "Recommendation ITU-R BT.500-13", 2012, <https:// [BT500] ITU-R, "Recommendation ITU-R BT.500-13", 2012,
www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC- <https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-
BT.500-13-201201-I!!PDF-E.pdf>. BT.500-13-201201-I!!PDF-E.pdf>.
[CIEDE2000] [CIEDE2000]
Yang, Y., Ming, J., and N. Yu, "Color Image Quality Yang, Y., Ming, J., and N. Yu, "Color Image Quality
Assessment Based on CIEDE2000", 2012, Assessment Based on CIEDE2000", 2012,
<http://dx.doi.org/10.1155/2012/273723>. <http://dx.doi.org/10.1155/2012/273723>.
[COMPARECODECS] [COMPARECODECS]
Alvestrand, H., "Compare Codecs", 2015, Alvestrand, H., "Compare Codecs", 2015,
<http://compare-codecs.appspot.com/>. <http://compare-codecs.appspot.com/>.
[DAALA-GIT] [DAALA-GIT]
Xiph.Org, "Daala Git Repository", 2015, Xiph.Org, "Daala Git Repository", 2015,
<http://git.xiph.org/?p=daala.git;a=summary>. <http://git.xiph.org/?p=daala.git;a=summary>.
[DERFVIDEO] [DERFVIDEO]
Terriberry, T., "Xiph.org Video Test Media", n.d., <https: Terriberry, T., "Xiph.org Video Test Media", n.d.,
//media.xiph.org/video/derf/>. <https://media.xiph.org/video/derf/>.
[FASTSSIM] [FASTSSIM]
Chen, M. and A. Bovik, "Fast structural similarity index Chen, M. and A. Bovik, "Fast structural similarity index
algorithm", 2010, <http://live.ece.utexas.edu/publications algorithm", 2010,
/2011/chen_rtip_2011.pdf>. <http://live.ece.utexas.edu/publications/2011/
chen_rtip_2011.pdf>.
[I-D.ietf-netvc-requirements]
Filippov, A., Norkin, A., and j.
jose.roberto.alvarez@huawei.com, "<Video Codec
Requirements and Evaluation Methodology>", draft-ietf-
netvc-requirements-02 (work in progress), June 2016.
[L1100] Bossen, F., "Common test conditions and software reference [L1100] Bossen, F., "Common test conditions and software reference
configurations", JCTVC L1100, 2013, configurations", JCTVC L1100, 2013,
<http://phenix.int-evry.fr/jct/>. <http://phenix.int-evry.fr/jct/>.
[MSSSIM] Wang, Z., Simoncelli, E., and A. Bovik, "Multi-Scale [MSSSIM] Wang, Z., Simoncelli, E., and A. Bovik, "Multi-Scale
Structural Similarity for Image Quality Assessment", n.d., Structural Similarity for Image Quality Assessment", n.d.,
<http://www.cns.nyu.edu/~zwang/files/papers/msssim.pdf>. <http://www.cns.nyu.edu/~zwang/files/papers/msssim.pdf>.
[PSNRHVS] Egiazarian, K., Astola, J., Ponomarenko, N., Lukin, V., [PSNRHVS] Egiazarian, K., Astola, J., Ponomarenko, N., Lukin, V.,
skipping to change at page 17, line 26 skipping to change at page 17, line 22
[SSIM] Wang, Z., Bovik, A., Sheikh, H., and E. Simoncelli, "Image [SSIM] Wang, Z., Bovik, A., Sheikh, H., and E. Simoncelli, "Image
Quality Assessment: From Error Visibility to Structural Quality Assessment: From Error Visibility to Structural
Similarity", 2004, Similarity", 2004,
<http://www.cns.nyu.edu/pub/eero/wang03-reprint.pdf>. <http://www.cns.nyu.edu/pub/eero/wang03-reprint.pdf>.
[STEAM] Valve Corporation, "Steam Hardware & Software Survey: June [STEAM] Valve Corporation, "Steam Hardware & Software Survey: June
2015", June 2015, 2015", June 2015,
<http://store.steampowered.com/hwsurvey>. <http://store.steampowered.com/hwsurvey>.
[TESTSEQUENCES] [TESTSEQUENCES]
Daede, T., "Test Sets", n.d., <https://people.xiph.org/ Daede, T., "Test Sets", n.d.,
~tdaede/sets/>. <https://people.xiph.org/~tdaede/sets/>.
[VMAF] Aaron, A., Li, Z., Manohara, M., Lin, J., Wu, E., and C. [VMAF] Aaron, A., Li, Z., Manohara, M., Lin, J., Wu, E., and C.
Kuo, "VMAF - Video Multi-Method Assessment Fusion", 2015, Kuo, "VMAF - Video Multi-Method Assessment Fusion", 2015,
<https://github.com/Netflix/vmaf>. <https://github.com/Netflix/vmaf>.
Authors' Addresses Authors' Addresses
Thomas Daede Thomas Daede
Mozilla Mozilla
 End of changes. 20 change blocks. 
35 lines changed or deleted 39 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/