draft-ietf-payload-vp9-12.txt   draft-ietf-payload-vp9-13.txt 
AVTCore Working Group J. Uberti AVTCore Working Group J. Uberti
Internet-Draft S. Holmer Internet-Draft S. Holmer
Intended status: Standards Track M. Flodman Intended status: Standards Track M. Flodman
Expires: 3 October 2021 D. Hong Expires: 8 November 2021 D. Hong
Google Google
J. Lennox J. Lennox
8x8 / Jitsi 8x8 / Jitsi
1 April 2021 7 May 2021
RTP Payload Format for VP9 Video RTP Payload Format for VP9 Video
draft-ietf-payload-vp9-12 draft-ietf-payload-vp9-13
Abstract Abstract
This memo describes an RTP payload format for the VP9 video codec. This specification describes an RTP payload format for the VP9 video
The payload format has wide applicability, as it supports codec. The payload format has wide applicability, as it supports
applications from low bit-rate peer-to-peer usage, to high bit-rate applications from low bit-rate peer-to-peer usage, to high bit-rate
video conferences. It includes provisions for temporal and spatial video conferences. It includes provisions for temporal and spatial
scalability. scalability.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on 3 October 2021. This Internet-Draft will expire on 8 November 2021.
Copyright Notice Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document. license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
skipping to change at page 2, line 26 skipping to change at page 2, line 26
4.2.1. Scalability Structure (SS): . . . . . . . . . . . . . 11 4.2.1. Scalability Structure (SS): . . . . . . . . . . . . . 11
4.3. Frame Fragmentation . . . . . . . . . . . . . . . . . . . 12 4.3. Frame Fragmentation . . . . . . . . . . . . . . . . . . . 12
4.4. Scalable encoding considerations . . . . . . . . . . . . 13 4.4. Scalable encoding considerations . . . . . . . . . . . . 13
4.5. Examples of VP9 RTP Stream . . . . . . . . . . . . . . . 13 4.5. Examples of VP9 RTP Stream . . . . . . . . . . . . . . . 13
4.5.1. Reference picture use for scalable structure . . . . 13 4.5.1. Reference picture use for scalable structure . . . . 13
5. Feedback Messages and Header Extensions . . . . . . . . . . . 14 5. Feedback Messages and Header Extensions . . . . . . . . . . . 14
5.1. Reference Picture Selection Indication (RPSI) . . . . . . 14 5.1. Reference Picture Selection Indication (RPSI) . . . . . . 14
5.2. Full Intra Request (FIR) . . . . . . . . . . . . . . . . 15 5.2. Full Intra Request (FIR) . . . . . . . . . . . . . . . . 15
5.3. Layer Refresh Request (LRR) . . . . . . . . . . . . . . . 15 5.3. Layer Refresh Request (LRR) . . . . . . . . . . . . . . . 15
6. Payload Format Parameters . . . . . . . . . . . . . . . . . . 16 6. Payload Format Parameters . . . . . . . . . . . . . . . . . . 16
6.1. Media Type Definition . . . . . . . . . . . . . . . . . . 16 6.1. SDP Parameters . . . . . . . . . . . . . . . . . . . . . 17
6.2. SDP Parameters . . . . . . . . . . . . . . . . . . . . . 19 6.1.1. Mapping of Media Subtype Parameters to SDP . . . . . 18
6.2.1. Mapping of Media Subtype Parameters to SDP . . . . . 19 6.1.2. Offer/Answer Considerations . . . . . . . . . . . . . 18
6.2.2. Offer/Answer Considerations . . . . . . . . . . . . . 20 7. Media Type Definition . . . . . . . . . . . . . . . . . . . . 19
7. Security Considerations . . . . . . . . . . . . . . . . . . . 20 8. Security Considerations . . . . . . . . . . . . . . . . . . . 21
8. Congestion Control . . . . . . . . . . . . . . . . . . . . . 21 9. Congestion Control . . . . . . . . . . . . . . . . . . . . . 21
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21
10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 21 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 22
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 21 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 22
11.1. Normative References . . . . . . . . . . . . . . . . . . 21 12.1. Normative References . . . . . . . . . . . . . . . . . . 22
11.2. Informative References . . . . . . . . . . . . . . . . . 23 12.2. Informative References . . . . . . . . . . . . . . . . . 23
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 24
1. Introduction 1. Introduction
This memo describes an RTP payload specification applicable to the This specification describes an RTP payload specification applicable
transmission of video streams encoded using the VP9 video codec to the transmission of video streams encoded using the VP9 video
[VP9-BITSTREAM]. The format described in this document can be used codec [VP9-BITSTREAM]. The format described in this document can be
both in peer-to-peer and video conferencing applications. used both in peer-to-peer and video conferencing applications.
The VP9 video codec was developed by Google, and is the successor to The VP9 video codec was developed by Google, and is the successor to
its earlier VP8 [RFC6386] codec. Above the compression improvements its earlier VP8 [RFC6386] codec. Above the compression improvements
and other general enhancements above VP8, VP9 is also designed in a and other general enhancements above VP8, VP9 is also designed in a
way that allows spatially-scalable video encoding. way that allows spatially-scalable video encoding.
2. Conventions, Definitions and Acronyms 2. Conventions, Definitions and Acronyms
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
skipping to change at page 5, line 5 skipping to change at page 5, line 5
temporal layer hierarchies and patterns which are changing temporal layer hierarchies and patterns which are changing
dynamically. dynamically.
In non-flexible mode, the inter-picture dependency (the reference In non-flexible mode, the inter-picture dependency (the reference
indices) of a Picture Group (PG) MUST be pre-specified as part of the indices) of a Picture Group (PG) MUST be pre-specified as part of the
scalability structure (SS) data. In this mode, each packet has an scalability structure (SS) data. In this mode, each packet has an
index to refer to one of the described pictures in the PG, from which index to refer to one of the described pictures in the PG, from which
the pictures referenced by the picture transmitted in the current the pictures referenced by the picture transmitted in the current
packet for inter-picture prediction can be identified. packet for inter-picture prediction can be identified.
(Editor's Note: A "Picture Group", as used in this document, is not (Note: A "Picture Group", as used in this document, is not the same
the same thing as a the term "Group of Pictures" as it is thing as a the term "Group of Pictures" as it is traditionally used
traditionally used in video coding, i.e. to mean an independently- in video coding, i.e. to mean an independently-decoadable run of
decoadable run of pictures beginning with a keyframe. Suggestions pictures beginning with a keyframe.)
for better terminology are welcome.)
The SS data can also be used to specify the resolution of each The SS data can also be used to specify the resolution of each
spatial layer present in the VP9 stream for both flexible and non- spatial layer present in the VP9 stream for both flexible and non-
flexible modes. flexible modes.
4. Payload Format 4. Payload Format
This section describes how the encoded VP9 bitstream is encapsulated This section describes how the encoded VP9 bitstream is encapsulated
in RTP. To handle network losses usage of RTP/AVPF [RFC4585] is in RTP. To handle network losses usage of RTP/AVPF [RFC4585] is
RECOMMENDED. All integer fields in the specifications are encoded as RECOMMENDED. All integer fields in the specifications are encoded as
skipping to change at page 6, line 24 skipping to change at page 6, line 24
spatial layer frame (the final packet of the picture), and 0 spatial layer frame (the final packet of the picture), and 0
otherwise. Unless spatial scalability is in use for this picture, otherwise. Unless spatial scalability is in use for this picture,
this will have the same value as the E bit described below. Note this will have the same value as the E bit described below. Note
this bit MUST be set to 1 for the target spatial layer frame if a this bit MUST be set to 1 for the target spatial layer frame if a
stream is being rewritten to remove higher spatial layers. stream is being rewritten to remove higher spatial layers.
Payload Type (PT): In line with the policy in Section 3 of Payload Type (PT): In line with the policy in Section 3 of
[RFC3551], applications using the VP9 RTP payload profile MUST [RFC3551], applications using the VP9 RTP payload profile MUST
assign a dynamic payload type number to be used in each RTP assign a dynamic payload type number to be used in each RTP
session and provide a mechanism to indicate the mapping. See session and provide a mechanism to indicate the mapping. See
Section 6.2 for the mechanism to be used with the Session Section 6.1 for the mechanism to be used with the Session
Description Protocol (SDP) [RFC8866]. Description Protocol (SDP) [RFC8866].
Timestamp: The RTP timestamp indicates the time when the input frame Timestamp: The RTP timestamp indicates the time when the input frame
was sampled, at a clock rate of 90 kHz. If the input picture is was sampled, at a clock rate of 90 kHz. If the input picture is
encoded with multiple layer frames, all of the frames of the encoded with multiple layer frames, all of the frames of the
picture MUST have the same timestamp. picture MUST have the same timestamp.
If a frame has the VP9 show_frame field set to 0 (i.e., it is If a frame has the VP9 show_frame field set to 0 (i.e., it is
meant only to populate a reference buffer, without being output) meant only to populate a reference buffer, without being output)
its timestamp MAY alternately be set to be the same as the its timestamp MAY alternatively be set to be the same as the
subsequent frame with show_frame equal to 1. (This will be subsequent frame with show_frame equal to 1. (This will be
convenient for playing out pre-encoded content packaged with VP9 convenient for playing out pre-encoded content packaged with VP9
"superframes", which typically bundle show_frame==0 frames with a "superframes", which typically bundle show_frame==0 frames with a
subsequent show_frame==1 frame.) Every frame with show_frame==1, subsequent show_frame==1 frame.) Every frame with show_frame==1,
however, MUST have a unique timestamp modulo the 2^32 wrap of the however, MUST have a unique timestamp modulo the 2^32 wrap of the
field. field.
The remaining RTP Fixed Header Fields (V, P, X, CC, sequence number, The remaining RTP Fixed Header Fields (V, P, X, CC, sequence number,
SSRC and CSRC identifiers) are used as specified in Section 5.1 of SSRC and CSRC identifiers) are used as specified in Section 5.1 of
[RFC3550]. [RFC3550].
4.2. VP9 Payload Descriptor 4.2. VP9 Payload Descriptor
In flexible mode (with the F bit below set to 1), The first octets In flexible mode (with the F bit below set to 1), the first octets
after the RTP header are the VP9 payload descriptor, with the after the RTP header are the VP9 payload descriptor, with the
following structure. following structure.
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|I|P|L|F|B|E|V|Z| (REQUIRED) |I|P|L|F|B|E|V|Z| (REQUIRED)
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
I: |M| PICTURE ID | (REQUIRED) I: |M| PICTURE ID | (REQUIRED)
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
M: | EXTENDED PID | (RECOMMENDED) M: | EXTENDED PID | (RECOMMENDED)
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
L: | TID |U| SID |D| (CONDITIONALLY RECOMMENDED) L: | TID |U| SID |D| (Conditionally RECOMMENDED)
+-+-+-+-+-+-+-+-+ -\ +-+-+-+-+-+-+-+-+ -\
P,F: | P_DIFF |N| (CONDITIONALLY REQUIRED) - up to 3 times P,F: | P_DIFF |N| (Conditionally REQUIRED) - up to 3 times
+-+-+-+-+-+-+-+-+ -/ +-+-+-+-+-+-+-+-+ -/
V: | SS | V: | SS |
| .. | | .. |
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
Figure 2 Figure 2
In non-flexible mode (with the F bit below set to 0), The first In non-flexible mode (with the F bit below set to 0), The first
octets after the RTP header are the VP9 payload descriptor, with the octets after the RTP header are the VP9 payload descriptor, with the
following structure. following structure.
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|I|P|L|F|B|E|V|Z| (REQUIRED) |I|P|L|F|B|E|V|Z| (REQUIRED)
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
I: |M| PICTURE ID | (RECOMMENDED) I: |M| PICTURE ID | (RECOMMENDED)
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
M: | EXTENDED PID | (RECOMMENDED) M: | EXTENDED PID | (RECOMMENDED)
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
L: | TID |U| SID |D| (CONDITIONALLY RECOMMENDED) L: | TID |U| SID |D| (Conditionally RECOMMENDED)
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
| TL0PICIDX | (CONDITIONALLY REQUIRED) | TL0PICIDX | (Conditionally REQUIRED)
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
V: | SS | V: | SS |
| .. | | .. |
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
Figure 3 Figure 3
I: Picture ID (PID) present. When set to one, the OPTIONAL PID MUST I: Picture ID (PID) present. When set to one, the OPTIONAL PID MUST
be present after the mandatory first octet and specified as below. be present after the mandatory first octet and specified as below.
Otherwise, PID MUST NOT be present. If the SS field was present Otherwise, PID MUST NOT be present. If the SS field was present
skipping to change at page 9, line 27 skipping to change at page 9, line 27
The field MUST be present if the I bit is equal to one. If set, The field MUST be present if the I bit is equal to one. If set,
the PID field MUST contain 15 bits; otherwise, it MUST contain 7 the PID field MUST contain 15 bits; otherwise, it MUST contain 7
bits. See PID below. bits. See PID below.
Picture ID (PID): Picture ID represented in 7 or 15 bits, depending Picture ID (PID): Picture ID represented in 7 or 15 bits, depending
on the M bit. This is a running index of the pictures. The field on the M bit. This is a running index of the pictures. The field
MUST be present if the I bit is equal to one. If M is set to MUST be present if the I bit is equal to one. If M is set to
zero, 7 bits carry the PID; else if M is set to one, 15 bits carry zero, 7 bits carry the PID; else if M is set to one, 15 bits carry
the PID in network byte order. The sender may choose between a 7- the PID in network byte order. The sender may choose between a 7-
or 15-bit index. The PID SHOULD start on a random number, and or 15-bit index. The PID SHOULD start on a random number, and
MUST wrap after reaching the maximum ID. The receiver MUST NOT MUST wrap after reaching the maximum ID (0x7f or 0x7fff depending
assume that the number of bits in PID stay the same through the on the index size chosen). The receiver MUST NOT assume that the
session. number of bits in PID stay the same through the session.
In the non-flexible mode (when the F bit is set to 0), this PID is In the non-flexible mode (when the F bit is set to 0), this PID is
used as an index to the picture group (PG) specified in the SS used as an index to the picture group (PG) specified in the SS
data below. In this mode, the PID of the key frame corresponds to data below. In this mode, the PID of the key frame corresponds to
the first specified frame in the PG. Then subsequent PIDs are the first specified frame in the PG. Then subsequent PIDs are
mapped to subsequently specified frames in the PG (modulo N_G, mapped to subsequently specified frames in the PG (modulo N_G,
specified in the SS data below), respectively. specified in the SS data below), respectively.
All frames of the same picture MUST have the same PID value. All frames of the same picture MUST have the same PID value.
Frames (and their corresponding pictures) with the VP9 show_frame Frames (and their corresponding pictures) with the VP9 show_frame
field equal to 0 MUST have distinct PID values from subsequent field equal to 0 MUST have distinct PID values from subsequent
pictures with show_frame equal to 1. Thus, a Picture as defined pictures with show_frame equal to 1. Thus, a Picture as defined
in this specification is different than a VP9 Superframe. in this specification is different than a VP9 Superframe.
All frames of the same picture MUST have the same value for All frames of the same picture MUST have the same value for
show_frame. show_frame.
Layer indices: This information is optional but recommended whenever Layer indices: This information is optional but RECOMMENDED whenever
encoding with layers. For both flexible and non-flexible modes, encoding with layers. For both flexible and non-flexible modes,
one octet is used to specify a layer frame's temporal layer ID one octet is used to specify a layer frame's temporal layer ID
(TID) and spatial layer ID (SID) as shown both in Figure 2 and (TID) and spatial layer ID (SID) as shown both in Figure 2 and
Figure 3. Additionally, a bit (U) is used to indicate that the Figure 3. Additionally, a bit (U) is used to indicate that the
current frame is a "switching up point" frame. Another bit (D) is current frame is a "switching up point" frame. Another bit (D) is
used to indicate whether inter-layer prediction is used for the used to indicate whether inter-layer prediction is used for the
current frame. current frame.
In the non-flexible mode (when the F bit is set to 0), another In the non-flexible mode (when the F bit is set to 0), another
octet is used to represent temporal layer 0 index (TL0PICIDX), as octet is used to represent temporal layer 0 index (TL0PICIDX), as
skipping to change at page 10, line 37 skipping to change at page 10, line 37
the current picture (in coding order) with temporal layer ID the current picture (in coding order) with temporal layer ID
greater than TID. greater than TID.
SID: The spatial layer ID of current frame. Note that frames SID: The spatial layer ID of current frame. Note that frames
with spatial layer SID > 0 may be dependent on decoded spatial with spatial layer SID > 0 may be dependent on decoded spatial
layer SID-1 frame within the same picture. Different frames of layer SID-1 frame within the same picture. Different frames of
the same picture MUST have distinct spatial layer IDs, and the same picture MUST have distinct spatial layer IDs, and
frames' spatial layers MUST appear in increasing order within frames' spatial layers MUST appear in increasing order within
the frame. the frame.
D: Inter-layer dependency used. MUST be set to one if current D: Inter-layer dependency used. MUST be set to one if and only
spatial layer SID frame depends on spatial layer SID-1 frame of if the current spatial layer SID frame depends on spatial layer
the same picture. MUST only be set to zero if current spatial SID-1 frame of the same picture, otherwise MUST set to zero.
layer SID frame does not depend on spatial layer SID-1 frame of For the base layer frame (with SID equal to 0), this D bit MUST
the same picture. For the base layer frame (with SID equal to be set to zero.
0), this D bit MUST be set to zero.
TL0PICIDX: 8 bits temporal layer zero index. TL0PICIDX is only TL0PICIDX: 8 bits temporal layer zero index. TL0PICIDX is only
present in the non-flexible mode (F = 0). This is a running present in the non-flexible mode (F = 0). This is a running
index for the temporal base layer pictures, i.e., the pictures index for the temporal base layer pictures, i.e., the pictures
with TID set to 0. If TID is larger than 0, TL0PICIDX with TID set to 0. If TID is larger than 0, TL0PICIDX
indicates which temporal base layer picture the current picture indicates which temporal base layer picture the current picture
depends on. TL0PICIDX MUST be incremented when TID is equal to depends on. TL0PICIDX MUST be incremented when TID is equal to
0. The index SHOULD start on a random number, and MUST restart 0. The index SHOULD start on a random number, and MUST restart
at 0 after reaching the maximum number 255. at 0 after reaching the maximum number 255.
Reference indices: When P and F are both set to one, indicating a Reference indices: When P and F are both set to one, indicating a
non-key frame in flexible mode, then at least one reference index non-key frame in flexible mode, then at least one reference index
has to be specified as below. Additional reference indices (total MUST be specified as below. Additional reference indices (total
of up to 3 reference indices are allowed) may be specified using of up to 3 reference indices are allowed) may be specified using
the N bit below. When either P or F is set to zero, then no the N bit below. When either P or F is set to zero, then no
reference index is specified. reference index is specified.
P_DIFF: The reference index (in 7 bits) specified as the relative P_DIFF: The reference index (in 7 bits) specified as the relative
PID from the current picture. For example, when P_DIFF=3 on a PID from the current picture. For example, when P_DIFF=3 on a
packet containing the picture with PID 112 means that the packet containing the picture with PID 112 means that the
picture refers back to the picture with PID 109. This picture refers back to the picture with PID 109. This
calculation is done modulo the size of the PID field, i.e., calculation is done modulo the size of the PID field, i.e.,
either 7 or 15 bits. either 7 or 15 bits.
skipping to change at page 16, line 18 skipping to change at page 16, line 18
{1,0} to a receiver and which wants to upgrade to {2,1}. In response {1,0} to a receiver and which wants to upgrade to {2,1}. In response
the encoder should encode the next frames in layers {1,1} and {2,1} the encoder should encode the next frames in layers {1,1} and {2,1}
by only referring to frames in {1,0}, or {0,0}. by only referring to frames in {1,0}, or {0,0}.
In the non-flexible mode, periodic upgrade frames can be defined by In the non-flexible mode, periodic upgrade frames can be defined by
the layer structure of the SS, thus periodic upgrade frames can be the layer structure of the SS, thus periodic upgrade frames can be
automatically identified by the picture ID. automatically identified by the picture ID.
6. Payload Format Parameters 6. Payload Format Parameters
This payload format has three optional parameters. This payload format has three optional parameters, "max-fr", "max-
fs", and "profile-id".
6.1. Media Type Definition
This registration is done using the template defined in [RFC6838] and
following [RFC4855].
Type name:
video
Subtype name:
VP9
Required parameters:
None.
Optional parameters:
The max-fr and max-fs parameters are used to signal the
capabilities of a receiver implementation. If the implementation
is willing to receive media, both parameters MUST be provided.
These parameters MUST NOT be used for any other purpose. A media
sender SHOULD NOT send media with a frame rate or frame size
exceeding the max-fr and max-fs values signaled. (There may be
scenarios, such as pre-encoded media or selective forwarding
middleboxes [RFC7667], where a media sender does not have media
available that fits within a receivers max-fs and max-fr value; in
such scenarios, a sender MAY exceed the signaled values.)
max-fr: The value of max-fr is an integer indicating the maximum
frame rate in units of frames per second that the decoder is
capable of decoding.
max-fs: The value of max-fs is an integer indicating the maximum
frame size in units of macroblocks that the decoder is capable
of decoding.
The decoder is capable of decoding this frame size as long as
the width and height of the frame in macroblocks are less than
int(sqrt(max-fs * 8)) - for instance, a max-fs of 1200 (capable
of supporting 640x480 resolution) will support widths and
heights up to 1552 pixels (97 macroblocks).
profile-id: The value of profile-id is an integer indicating the
default coding profile, the subset of coding tools that may
have been used to generate the stream or that the receiver
supports). Table 2 lists all of the profiles defined in
section 7.2 of [VP9-BITSTREAM] and the corresponding integer
values to be used.
If no profile-id is present, Profile 0 MUST be inferred. (The
profile-id parameter was added relatively late in the
development of this specification, so some existing
implementations may not send it.)
Informative note: See Table 3 for capabilities of coding
profiles defined in section 7.2 of [VP9-BITSTREAM].
Encoding considerations:
This media type is framed in RTP and contains binary data; see
Section 4.8 of [RFC6838].
Security considerations:
See Section 7 of RFC xxxx.
[RFC Editor: Upon publication as an RFC, please replace "XXXX"
with the number assigned to this document and remove this note.]
Interoperability considerations:
None.
Published specification:
VP9 bitstream format [VP9-BITSTREAM] and RFC XXXX.
[RFC Editor: Upon publication as an RFC, please replace "XXXX"
with the number assigned to this document and remove this note.]
Applications which use this media type: The max-fr and max-fs parameters are used to signal the capabilities
For example: Video over IP, video conferencing. of a receiver implementation. If the implementation is willing to
receive media, both parameters MUST be provided. These parameters
MUST NOT be used for any other purpose. A media sender SHOULD NOT
send media with a frame rate or frame size exceeding the max-fr and
max-fs values signaled. (There may be scenarios, such as pre-encoded
media or selective forwarding middleboxes [RFC7667], where a media
sender does not have media available that fits within a receivers
max-fs and max-fr value; in such scenarios, a sender MAY exceed the
signaled values.)
Fragment identifier considerations: max-fr: The value of max-fr is an integer indicating the maximum
N/A. frame rate in units of frames per second that the decoder is
capable of decoding.
Additional information: max-fs: The value of max-fs is an integer indicating the maximum
None. frame size in units of macroblocks that the decoder is capable of
decoding.
Person & email address to contact for further information: The decoder is capable of decoding this frame size as long as the
Jonathan Lennox <jonathan.lennox@8x8.com> width and height of the frame in macroblocks are less than
int(sqrt(max-fs * 8)) - for instance, a max-fs of 1200 (capable of
supporting 640x480 resolution) will support widths and heights up
to 1552 pixels (97 macroblocks).
Intended usage: profile-id: The value of profile-id is an integer indicating the
COMMON default coding profile, the subset of coding tools that may have
been used to generate the stream or that the receiver supports).
Table 2 lists all of the profiles defined in section 7.2 of
[VP9-BITSTREAM] and the corresponding integer values to be used.
Restrictions on usage: If no profile-id is present, Profile 0 MUST be inferred. (The
This media type depends on RTP framing, and hence is only defined profile-id parameter was added relatively late in the development
for transfer via RTP [RFC3550]. of this specification, so some existing implementations may not
send it.)
Author: Informative note: See Table 3 for capabilities of coding profiles
Jonathan Lennox <jonathan.lennox@8x8.com> defined in section 7.2 of [VP9-BITSTREAM].
Change controller: A receiver MUST ignore any parameter unspecified in this
IETF AVTCore Working Group delegated from the IESG. specification.
+=========+============+ +=========+============+
| Profile | profile-id | | Profile | profile-id |
+=========+============+ +=========+============+
| 0 | 0 | | 0 | 0 |
+---------+------------+ +---------+------------+
| 1 | 1 | | 1 | 1 |
+---------+------------+ +---------+------------+
| 2 | 2 | | 2 | 2 |
+---------+------------+ +---------+------------+
| 3 | 3 | | 3 | 3 |
+---------+------------+ +---------+------------+
Table 2: Table 1. Table 2: Table of
Table of profile-id profile-id integer
integer values values representing
representing the VP9 the VP9 profile
profile corresponding corresponding to the
to the set of coding set of coding tools
tools supported. supported.
+=========+===========+=================+==========================+ +=========+===========+=================+==========================+
| Profile | Bit Depth | SRGB Colorspace | Chroma Subsampling | | Profile | Bit Depth | SRGB Colorspace | Chroma Subsampling |
+=========+===========+=================+==========================+ +=========+===========+=================+==========================+
| 0 | 8 | No | YUV 4:2:0 | | 0 | 8 | No | YUV 4:2:0 |
+---------+-----------+-----------------+--------------------------+ +---------+-----------+-----------------+--------------------------+
| 1 | 8 | Yes | YUV 4:2:0,4:4:0 or 4:4:4 | | 1 | 8 | Yes | YUV 4:2:0,4:4:0 or 4:4:4 |
+---------+-----------+-----------------+--------------------------+ +---------+-----------+-----------------+--------------------------+
| 2 | 10 or 12 | No | YUV 4:2:0 | | 2 | 10 or 12 | No | YUV 4:2:0 |
+---------+-----------+-----------------+--------------------------+ +---------+-----------+-----------------+--------------------------+
| 3 | 10 or 12 | Yes | YUV 4:2:0,4:4:0 or 4:4:4 | | 3 | 10 or 12 | Yes | YUV 4:2:0,4:4:0 or 4:4:4 |
+---------+-----------+-----------------+--------------------------+ +---------+-----------+-----------------+--------------------------+
Table 3: Table 2. Table of profile capabilities. Table 3: Table of profile capabilities.
6.2. SDP Parameters
The receiver MUST ignore any fmtp parameter unspecified in this memo.
6.2.1. Mapping of Media Subtype Parameters to SDP 6.1. SDP Parameters
6.1.1. Mapping of Media Subtype Parameters to SDP
The media type video/VP9 string is mapped to fields in the Session The media type video/VP9 string is mapped to fields in the Session
Description Protocol (SDP) [RFC8866] as follows: Description Protocol (SDP) [RFC8866] as follows:
* The media name in the "m=" line of SDP MUST be video. * The media name in the "m=" line of SDP MUST be video.
* The encoding name in the "a=rtpmap" line of SDP MUST be VP9 (the * The encoding name in the "a=rtpmap" line of SDP MUST be VP9 (the
media subtype). media subtype).
* The clock rate in the "a=rtpmap" line MUST be 90000. * The clock rate in the "a=rtpmap" line MUST be 90000.
skipping to change at page 19, line 47 skipping to change at page 18, line 28
receiver capabilities. These parameters are expressed as a media receiver capabilities. These parameters are expressed as a media
subtype string, in the form of a semicolon separated list of subtype string, in the form of a semicolon separated list of
parameter=value pairs. parameter=value pairs.
* The OPTIONAL parameter profile-id, when present, SHOULD be * The OPTIONAL parameter profile-id, when present, SHOULD be
included in the "a=fmtp" line of SDP. This parameter is expressed included in the "a=fmtp" line of SDP. This parameter is expressed
as a media subtype string, in the form of a parameter=value pair. as a media subtype string, in the form of a parameter=value pair.
When the parameter is not present, a value of 0 MUST be inferred When the parameter is not present, a value of 0 MUST be inferred
for profile-id. for profile-id.
6.2.1.1. Example 6.1.1.1. Example
An example of media representation in SDP is as follows: An example of media representation in SDP is as follows:
m=video 49170 RTP/AVPF 98 m=video 49170 RTP/AVPF 98
a=rtpmap:98 VP9/90000 a=rtpmap:98 VP9/90000
a=fmtp:98 max-fr=30;max-fs=3600;profile-id=0 a=fmtp:98 max-fr=30;max-fs=3600;profile-id=0
6.2.2. Offer/Answer Considerations 6.1.2. Offer/Answer Considerations
When VP9 is offered over RTP using SDP in an Offer/Answer model When VP9 is offered over RTP using SDP in an Offer/Answer model
[RFC3264] for negotiation for unicast usage, the following [RFC3264] for negotiation for unicast usage, the following
limitations and rules apply: limitations and rules apply:
* The parameter identifying a media format configuration for VP9 is * The parameter identifying a media format configuration for VP9 is
profile-id. This media format configuration parameter MUST be profile-id. This media format configuration parameter MUST be
used symmetrically; that is, the answerer MUST either maintain used symmetrically; that is, the answerer MUST either maintain
this configuration parameter or remove the media format (payload this configuration parameter or remove the media format (payload
type) completely if it is not supported. type) completely if it is not supported.
skipping to change at page 20, line 37 skipping to change at page 19, line 21
* To simplify the handling and matching of these configurations, the * To simplify the handling and matching of these configurations, the
same RTP payload type number used in the offer SHOULD also be used same RTP payload type number used in the offer SHOULD also be used
in the answer and in a subsequent offer, as specified in in the answer and in a subsequent offer, as specified in
[RFC3264]. An answer or subsequent offer MUST NOT contain the [RFC3264]. An answer or subsequent offer MUST NOT contain the
payload type number used in the offer unless the profile-id value payload type number used in the offer unless the profile-id value
is exactly the same as in the original offer. However, max-fr and is exactly the same as in the original offer. However, max-fr and
max-fs parameters MAY be changed in subsequent offers and answers, max-fs parameters MAY be changed in subsequent offers and answers,
with the same payload type number, if an endpoint wishes to change with the same payload type number, if an endpoint wishes to change
its declared receiver capabilties. its declared receiver capabilties.
7. Security Considerations 7. Media Type Definition
This registration is done using the template defined in [RFC6838] and
following [RFC4855].
Type name:
video
Subtype name:
VP9
Required parameters:
N/A.
Optional parameters:
There are three optional parameters, "max-fr", "max-fs", and
"profile-id". See Section 6 for their definition.
Encoding considerations:
This media type is framed in RTP and contains binary data; see
Section 4.8 of [RFC6838].
Security considerations:
See Section 8 of RFC xxxx.
[RFC Editor: Upon publication as an RFC, please replace "XXXX"
with the number assigned to this document and remove this note.]
Interoperability considerations:
None.
Published specification:
VP9 bitstream format [VP9-BITSTREAM] and RFC XXXX.
[RFC Editor: Upon publication as an RFC, please replace "XXXX"
with the number assigned to this document and remove this note.]
Applications which use this media type:
For example: Video over IP, video conferencing.
Fragment identifier considerations:
N/A.
Additional information:
None.
Person & email address to contact for further information:
Jonathan Lennox <jonathan.lennox@8x8.com>
Intended usage:
COMMON
Restrictions on usage:
This media type depends on RTP framing, and hence is only defined
for transfer via RTP [RFC3550].
Author:
Jonathan Lennox <jonathan.lennox@8x8.com>
Change controller:
IETF AVTCore Working Group delegated from the IESG.
8. Security Considerations
RTP packets using the payload format defined in this specification RTP packets using the payload format defined in this specification
are subject to the security considerations discussed in the RTP are subject to the security considerations discussed in the RTP
specification [RFC3550], and in any applicable RTP profile such as specification [RFC3550], and in any applicable RTP profile such as
RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/ RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/
SAVPF [RFC5124]. SAVPF [RFC5124]. However, as "Securing the RTP SAVPF [RFC5124]. SAVPF [RFC5124]. However, as "Securing the RTP
Protocol Framework: Why RTP Does Not Mandate a Single Media Security Protocol Framework: Why RTP Does Not Mandate a Single Media Security
Solution" [RFC7202] discusses, it is not an RTP payload format's Solution" [RFC7202] discusses, it is not an RTP payload format's
responsibility to discuss or mandate what solutions are used to meet responsibility to discuss or mandate what solutions are used to meet
the basic security goals like confidentiality, integrity and source the basic security goals like confidentiality, integrity and source
skipping to change at page 21, line 13 skipping to change at page 21, line 30
mechanisms. The rest of this security consideration section mechanisms. The rest of this security consideration section
discusses the security impacting properties of the payload format discusses the security impacting properties of the payload format
itself. itself.
This RTP payload format and its media decoder do not exhibit any This RTP payload format and its media decoder do not exhibit any
significant non-uniformity in the receiver-side computational significant non-uniformity in the receiver-side computational
complexity for packet processing, and thus are unlikely to pose a complexity for packet processing, and thus are unlikely to pose a
denial-of-service threat due to the receipt of pathological data. denial-of-service threat due to the receipt of pathological data.
Nor does the RTP payload format contain any active content. Nor does the RTP payload format contain any active content.
8. Congestion Control 9. Congestion Control
Congestion control for RTP SHALL be used in accordance with RFC 3550 Congestion control for RTP SHALL be used in accordance with RFC 3550
[RFC3550], and with any applicable RTP profile; e.g., RFC 3551 [RFC3550], and with any applicable RTP profile; e.g., RFC 3551
[RFC3551]. The congestion control mechanism can, in a real-time [RFC3551]. The congestion control mechanism can, in a real-time
encoding scenario, adapt the transmission rate by instructing the encoding scenario, adapt the transmission rate by instructing the
encoder to encode at a certain target rate. Media aware network encoder to encode at a certain target rate. Media aware network
elements MAY use the information in the VP9 payload descriptor in elements MAY use the information in the VP9 payload descriptor in
Section 4.2 to identify non-reference frames and discard them in Section 4.2 to identify non-reference frames and discard them in
order to reduce network congestion. Note that discarding of non- order to reduce network congestion. Note that discarding of non-
reference frames cannot be done if the stream is encrypted (because reference frames cannot be done if the stream is encrypted (because
the non-reference marker is encrypted). the non-reference marker is encrypted).
9. IANA Considerations 10. IANA Considerations
The IANA is requested to register the media type registration "video/ The IANA is requested to register the media type registration "video/
vp9" as specified in Section 6.1. The media type is also requested vp9" as specified in Section 7. The media type is also requested to
to be added to the IANA registry for "RTP Payload Format MIME types" be added to the IANA registry for "RTP Payload Format MIME types"
<http://www.iana.org/assignments/rtp-parameters>. <http://www.iana.org/assignments/rtp-parameters>.
10. Acknowledgments 11. Acknowledgments
Alex Eleftheriadis, Yuki Ito, Won Kap Jang, Sergio Garcia Murillo, Alex Eleftheriadis, Yuki Ito, Won Kap Jang, Sergio Garcia Murillo,
Roi Sasson, Timothy Terriberry, Emircan Uysaler, and Thomas Volkert Roi Sasson, Timothy Terriberry, Emircan Uysaler, and Thomas Volkert
commented on the development of this document and provided helpful commented on the development of this document and provided helpful
comments and feedback. comments and feedback.
11. References 12. References
11.1. Normative References 12.1. Normative References
[I-D.ietf-avtext-lrr] [I-D.ietf-avtext-lrr]
Lennox, J., Hong, D., Uberti, J., Holmer, S., and M. Lennox, J., Hong, D., Uberti, J., Holmer, S., and M.
Flodman, "The Layer Refresh Request (LRR) RTCP Feedback Flodman, "The Layer Refresh Request (LRR) RTCP Feedback
Message", Work in Progress, Internet-Draft, draft-ietf- Message", Work in Progress, Internet-Draft, draft-ietf-
avtext-lrr-07, 2 July 2017, <http://www.ietf.org/internet- avtext-lrr-07, 2 July 2017,
drafts/draft-ietf-avtext-lrr-07.txt>. <https://www.ietf.org/archive/id/draft-ietf-avtext-lrr-
07.txt>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
with Session Description Protocol (SDP)", RFC 3264, with Session Description Protocol (SDP)", RFC 3264,
DOI 10.17487/RFC3264, June 2002, DOI 10.17487/RFC3264, June 2002,
<https://www.rfc-editor.org/info/rfc3264>. <https://www.rfc-editor.org/info/rfc3264>.
skipping to change at page 23, line 8 skipping to change at page 23, line 27
<https://www.rfc-editor.org/info/rfc8866>. <https://www.rfc-editor.org/info/rfc8866>.
[VP9-BITSTREAM] [VP9-BITSTREAM]
Grange, A., de Rivaz, P., and J. Hunt, "VP9 Bitstream & Grange, A., de Rivaz, P., and J. Hunt, "VP9 Bitstream &
Decoding Process Specification", Version 0.6, 31 March Decoding Process Specification", Version 0.6, 31 March
2016, 2016,
<https://storage.googleapis.com/downloads.webmproject.org/ <https://storage.googleapis.com/downloads.webmproject.org/
docs/vp9/vp9-bitstream-specification- docs/vp9/vp9-bitstream-specification-
v0.6-20160331-draft.pdf>. v0.6-20160331-draft.pdf>.
11.2. Informative References 12.2. Informative References
[RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
Video Conferences with Minimal Control", STD 65, RFC 3551, Video Conferences with Minimal Control", STD 65, RFC 3551,
DOI 10.17487/RFC3551, July 2003, DOI 10.17487/RFC3551, July 2003,
<https://www.rfc-editor.org/info/rfc3551>. <https://www.rfc-editor.org/info/rfc3551>.
[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
Norrman, "The Secure Real-time Transport Protocol (SRTP)", Norrman, "The Secure Real-time Transport Protocol (SRTP)",
RFC 3711, DOI 10.17487/RFC3711, March 2004, RFC 3711, DOI 10.17487/RFC3711, March 2004,
<https://www.rfc-editor.org/info/rfc3711>. <https://www.rfc-editor.org/info/rfc3711>.
 End of changes. 42 change blocks. 
163 lines changed or deleted 166 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/