draft-ietf-payload-vp9-11.txt   draft-ietf-payload-vp9-12.txt 
AVTCore Working Group J. Uberti AVTCore Working Group J. Uberti
Internet-Draft S. Holmer Internet-Draft S. Holmer
Intended status: Standards Track M. Flodman Intended status: Standards Track M. Flodman
Expires: 6 August 2021 D. Hong Expires: 3 October 2021 D. Hong
Google Google
J. Lennox J. Lennox
8x8 / Jitsi 8x8 / Jitsi
2 February 2021 1 April 2021
RTP Payload Format for VP9 Video RTP Payload Format for VP9 Video
draft-ietf-payload-vp9-11 draft-ietf-payload-vp9-12
Abstract Abstract
This memo describes an RTP payload format for the VP9 video codec. This memo describes an RTP payload format for the VP9 video codec.
The payload format has wide applicability, as it supports The payload format has wide applicability, as it supports
applications from low bit-rate peer-to-peer usage, to high bit-rate applications from low bit-rate peer-to-peer usage, to high bit-rate
video conferences. It includes provisions for temporal and spatial video conferences. It includes provisions for temporal and spatial
scalability. scalability.
Status of This Memo Status of This Memo
skipping to change at page 1, line 38 skipping to change at page 1, line 38
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on 6 August 2021. This Internet-Draft will expire on 3 October 2021.
Copyright Notice Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document. license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
skipping to change at page 2, line 27 skipping to change at page 2, line 27
4.3. Frame Fragmentation . . . . . . . . . . . . . . . . . . . 12 4.3. Frame Fragmentation . . . . . . . . . . . . . . . . . . . 12
4.4. Scalable encoding considerations . . . . . . . . . . . . 13 4.4. Scalable encoding considerations . . . . . . . . . . . . 13
4.5. Examples of VP9 RTP Stream . . . . . . . . . . . . . . . 13 4.5. Examples of VP9 RTP Stream . . . . . . . . . . . . . . . 13
4.5.1. Reference picture use for scalable structure . . . . 13 4.5.1. Reference picture use for scalable structure . . . . 13
5. Feedback Messages and Header Extensions . . . . . . . . . . . 14 5. Feedback Messages and Header Extensions . . . . . . . . . . . 14
5.1. Reference Picture Selection Indication (RPSI) . . . . . . 14 5.1. Reference Picture Selection Indication (RPSI) . . . . . . 14
5.2. Full Intra Request (FIR) . . . . . . . . . . . . . . . . 15 5.2. Full Intra Request (FIR) . . . . . . . . . . . . . . . . 15
5.3. Layer Refresh Request (LRR) . . . . . . . . . . . . . . . 15 5.3. Layer Refresh Request (LRR) . . . . . . . . . . . . . . . 15
6. Payload Format Parameters . . . . . . . . . . . . . . . . . . 16 6. Payload Format Parameters . . . . . . . . . . . . . . . . . . 16
6.1. Media Type Definition . . . . . . . . . . . . . . . . . . 16 6.1. Media Type Definition . . . . . . . . . . . . . . . . . . 16
6.2. SDP Parameters . . . . . . . . . . . . . . . . . . . . . 18 6.2. SDP Parameters . . . . . . . . . . . . . . . . . . . . . 19
6.2.1. Mapping of Media Subtype Parameters to SDP . . . . . 19 6.2.1. Mapping of Media Subtype Parameters to SDP . . . . . 19
6.2.2. Offer/Answer Considerations . . . . . . . . . . . . . 19 6.2.2. Offer/Answer Considerations . . . . . . . . . . . . . 20
7. Security Considerations . . . . . . . . . . . . . . . . . . . 20 7. Security Considerations . . . . . . . . . . . . . . . . . . . 20
8. Congestion Control . . . . . . . . . . . . . . . . . . . . . 20 8. Congestion Control . . . . . . . . . . . . . . . . . . . . . 21
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21
10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 21 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 21
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 21 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 21
11.1. Normative References . . . . . . . . . . . . . . . . . . 21 11.1. Normative References . . . . . . . . . . . . . . . . . . 21
11.2. Informative References . . . . . . . . . . . . . . . . . 22 11.2. Informative References . . . . . . . . . . . . . . . . . 23
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23
1. Introduction 1. Introduction
This memo describes an RTP payload specification applicable to the This memo describes an RTP payload specification applicable to the
transmission of video streams encoded using the VP9 video codec transmission of video streams encoded using the VP9 video codec
[VP9-BITSTREAM]. The format described in this document can be used [VP9-BITSTREAM]. The format described in this document can be used
both in peer-to-peer and video conferencing applications. both in peer-to-peer and video conferencing applications.
The VP9 video codec was developed by Google, and is the successor to The VP9 video codec was developed by Google, and is the successor to
its earlier VP8 [RFC6386] codec. Above the compression improvements its earlier VP8 [RFC6386] codec. Above the compression improvements
and other general enhancements above VP8, VP9 is also designed in a and other general enhancements above VP8, VP9 is also designed in a
way that allows spatially-scalable video encoding. way that allows spatially-scalable video encoding.
2. Conventions, Definitions and Acronyms 2. Conventions, Definitions and Acronyms
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
document are to be interpreted as described in [RFC2119]. "OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
3. Media Format Description 3. Media Format Description
The VP9 codec can maintain up to eight reference frames, of which up The VP9 codec can maintain up to eight reference frames, of which up
to three can be referenced by any new frame. to three can be referenced by any new frame.
VP9 also allows a frame to use another frame of a different VP9 also allows a frame to use another frame of a different
resolution as a reference frame. (Specifically, a frame may use any resolution as a reference frame. (Specifically, a frame may use any
references whose width and height are between 1/16th that of the references whose width and height are between 1/16th that of the
current frame and twice that of the current frame, inclusive.) This current frame and twice that of the current frame, inclusive.) This
skipping to change at page 10, line 15 skipping to change at page 10, line 15
used to indicate whether inter-layer prediction is used for the used to indicate whether inter-layer prediction is used for the
current frame. current frame.
In the non-flexible mode (when the F bit is set to 0), another In the non-flexible mode (when the F bit is set to 0), another
octet is used to represent temporal layer 0 index (TL0PICIDX), as octet is used to represent temporal layer 0 index (TL0PICIDX), as
depicted in Figure 3. The TL0PICIDX is present so that all depicted in Figure 3. The TL0PICIDX is present so that all
minimally required frames - the base temporal layer frames - can minimally required frames - the base temporal layer frames - can
be tracked. be tracked.
The TID and SID fields indicate the temporal and spatial layers The TID and SID fields indicate the temporal and spatial layers
and can help middleboxes and and endpoints quickly identify which and can help middleboxes and endpoints quickly identify which
layer a packet belongs to. layer a packet belongs to.
TID: The temporal layer ID of current frame. In the case of non- TID: The temporal layer ID of current frame. In the case of non-
flexible mode, if PID is mapped to a picture in a specified PG, flexible mode, if PID is mapped to a picture in a specified PG,
then the value of TID MUST match the corresponding TID value of then the value of TID MUST match the corresponding TID value of
the mapped picture in the PG. the mapped picture in the PG.
U: Switching up point. If this bit is set to 1 for the current U: Switching up point. If this bit is set to 1 for the current
picture with temporal layer ID equal to TID, then "switch up" picture with temporal layer ID equal to TID, then "switch up"
to a higher frame rate is possible as subsequent higher to a higher frame rate is possible as subsequent higher
temporal layer pictures will not depend on any picture before temporal layer pictures will not depend on any picture before
the current picture (in coding order) with temporal layer ID the current picture (in coding order) with temporal layer ID
greater than TID. greater than TID.
SID: The spatial layer ID of current frame. Note that frames SID: The spatial layer ID of current frame. Note that frames
with spatial layer SDI > 0 may be dependent on decoded spatial with spatial layer SID > 0 may be dependent on decoded spatial
layer SID-1 frame within the same picture. Different frames of layer SID-1 frame within the same picture. Different frames of
the same picture MUST have distinct spatial layer IDs, and the same picture MUST have distinct spatial layer IDs, and
frames' spatial layers MUST appear in increasing order within frames' spatial layers MUST appear in increasing order within
the frame. the frame.
D: Inter-layer dependency used. MUST be set to one if current D: Inter-layer dependency used. MUST be set to one if current
spatial layer SID frame depends on spatial layer SID-1 frame of spatial layer SID frame depends on spatial layer SID-1 frame of
the same picture. MUST only be set to zero if current spatial the same picture. MUST only be set to zero if current spatial
layer SID frame does not depend on spatial layer SID-1 frame of layer SID frame does not depend on spatial layer SID-1 frame of
the same picture. For the base layer frame (with SID equal to the same picture. For the base layer frame (with SID equal to
skipping to change at page 16, line 18 skipping to change at page 16, line 18
{1,0} to a receiver and which wants to upgrade to {2,1}. In response {1,0} to a receiver and which wants to upgrade to {2,1}. In response
the encoder should encode the next frames in layers {1,1} and {2,1} the encoder should encode the next frames in layers {1,1} and {2,1}
by only referring to frames in {1,0}, or {0,0}. by only referring to frames in {1,0}, or {0,0}.
In the non-flexible mode, periodic upgrade frames can be defined by In the non-flexible mode, periodic upgrade frames can be defined by
the layer structure of the SS, thus periodic upgrade frames can be the layer structure of the SS, thus periodic upgrade frames can be
automatically identified by the picture ID. automatically identified by the picture ID.
6. Payload Format Parameters 6. Payload Format Parameters
This payload format has two optional parameters. This payload format has three optional parameters.
6.1. Media Type Definition 6.1. Media Type Definition
This registration is done using the template defined in [RFC6838] and This registration is done using the template defined in [RFC6838] and
following [RFC4855]. following [RFC4855].
Type name: Type name:
video video
Subtype name: Subtype name:
VP9 VP9
Required parameters: Required parameters:
None. None.
Optional parameters: Optional parameters:
These parameters are used to signal the capabilities of a receiver The max-fr and max-fs parameters are used to signal the
implementation. If the implementation is willing to receive capabilities of a receiver implementation. If the implementation
media, both parameters MUST be provided. These parameters MUST is willing to receive media, both parameters MUST be provided.
NOT be used for any other purpose. These parameters MUST NOT be used for any other purpose. A media
sender SHOULD NOT send media with a frame rate or frame size
exceeding the max-fr and max-fs values signaled. (There may be
scenarios, such as pre-encoded media or selective forwarding
middleboxes [RFC7667], where a media sender does not have media
available that fits within a receivers max-fs and max-fr value; in
such scenarios, a sender MAY exceed the signaled values.)
max-fr: The value of max-fr is an integer indicating the maximum max-fr: The value of max-fr is an integer indicating the maximum
frame rate in units of frames per second that the decoder is frame rate in units of frames per second that the decoder is
capable of decoding. capable of decoding.
max-fs: The value of max-fs is an integer indicating the maximum max-fs: The value of max-fs is an integer indicating the maximum
frame size in units of macroblocks that the decoder is capable frame size in units of macroblocks that the decoder is capable
of decoding. of decoding.
The decoder is capable of decoding this frame size as long as The decoder is capable of decoding this frame size as long as
skipping to change at page 17, line 12 skipping to change at page 17, line 18
of supporting 640x480 resolution) will support widths and of supporting 640x480 resolution) will support widths and
heights up to 1552 pixels (97 macroblocks). heights up to 1552 pixels (97 macroblocks).
profile-id: The value of profile-id is an integer indicating the profile-id: The value of profile-id is an integer indicating the
default coding profile, the subset of coding tools that may default coding profile, the subset of coding tools that may
have been used to generate the stream or that the receiver have been used to generate the stream or that the receiver
supports). Table 2 lists all of the profiles defined in supports). Table 2 lists all of the profiles defined in
section 7.2 of [VP9-BITSTREAM] and the corresponding integer section 7.2 of [VP9-BITSTREAM] and the corresponding integer
values to be used. values to be used.
If no profile-id is present, Profile 0 MUST be inferred. If no profile-id is present, Profile 0 MUST be inferred. (The
profile-id parameter was added relatively late in the
development of this specification, so some existing
implementations may not send it.)
Informative note: See Table 3 for capabilities of coding Informative note: See Table 3 for capabilities of coding
profiles defined in section 7.2 of [VP9-BITSTREAM]. profiles defined in section 7.2 of [VP9-BITSTREAM].
Encoding considerations: Encoding considerations:
This media type is framed in RTP and contains binary data; see This media type is framed in RTP and contains binary data; see
Section 4.8 of [RFC6838]. Section 4.8 of [RFC6838].
Security considerations: Security considerations:
See Section 7 of RFC xxxx. See Section 7 of RFC xxxx.
skipping to change at page 19, line 17 skipping to change at page 19, line 35
The media type video/VP9 string is mapped to fields in the Session The media type video/VP9 string is mapped to fields in the Session
Description Protocol (SDP) [RFC8866] as follows: Description Protocol (SDP) [RFC8866] as follows:
* The media name in the "m=" line of SDP MUST be video. * The media name in the "m=" line of SDP MUST be video.
* The encoding name in the "a=rtpmap" line of SDP MUST be VP9 (the * The encoding name in the "a=rtpmap" line of SDP MUST be VP9 (the
media subtype). media subtype).
* The clock rate in the "a=rtpmap" line MUST be 90000. * The clock rate in the "a=rtpmap" line MUST be 90000.
* The parameters "max-fs", and "max-fr", MUST be included in the * The parameters "max-fr" and "max-fs" MUST be included in the
"a=fmtp" line of SDP if SDP is used to declare receiver "a=fmtp" line of SDP if the receiver wishes to declare its
capabilities. These parameters are expressed as a media subtype receiver capabilities. These parameters are expressed as a media
string, in the form of a semicolon separated list of subtype string, in the form of a semicolon separated list of
parameter=value pairs. parameter=value pairs.
* The OPTIONAL parameter profile-id, when present, SHOULD be * The OPTIONAL parameter profile-id, when present, SHOULD be
included in the "a=fmtp" line of SDP. This parameter is expressed included in the "a=fmtp" line of SDP. This parameter is expressed
as a media subtype string, in the form of a parameter=value pair. as a media subtype string, in the form of a parameter=value pair.
When the parameter is not present, a value of 0 MUST be used for When the parameter is not present, a value of 0 MUST be inferred
profile-id. for profile-id.
6.2.1.1. Example 6.2.1.1. Example
An example of media representation in SDP is as follows: An example of media representation in SDP is as follows:
m=video 49170 RTP/AVPF 98 a=rtpmap:98 VP9/90000 a=fmtp:98 max-fr=30; m=video 49170 RTP/AVPF 98
max-fs=3600; profile-id=0; a=rtpmap:98 VP9/90000
a=fmtp:98 max-fr=30;max-fs=3600;profile-id=0
6.2.2. Offer/Answer Considerations 6.2.2. Offer/Answer Considerations
When VP9 is offered over RTP using SDP in an Offer/Answer model When VP9 is offered over RTP using SDP in an Offer/Answer model
[RFC3264] for negotiation for unicast usage, the following [RFC3264] for negotiation for unicast usage, the following
limitations and rules apply: limitations and rules apply:
* The parameter identifying a media format configuration for VP9 is * The parameter identifying a media format configuration for VP9 is
profile-id. This media format configuration parameter MUST be profile-id. This media format configuration parameter MUST be
used symmetrically; that is, the answerer MUST either maintain all used symmetrically; that is, the answerer MUST either maintain
configuration parameters or remove the media format (payload type) this configuration parameter or remove the media format (payload
completely if one or more of the parameter values are not type) completely if it is not supported.
supported.
* The max-fr and max-fs parameters are used declaratively to
describe receiver capabilities, even in the Offer/Answer model.
The values in an answer are used to describe the answerer's
capabilities, and thus their values are set independently of the
values in the offer.
* To simplify the handling and matching of these configurations, the * To simplify the handling and matching of these configurations, the
same RTP payload type number used in the offer SHOULD also be used same RTP payload type number used in the offer SHOULD also be used
in the answer, as specified in [RFC3264]. An answer MUST NOT in the answer and in a subsequent offer, as specified in
contain the payload type number used in the offer unless the [RFC3264]. An answer or subsequent offer MUST NOT contain the
configuration is exactly the same as in the offer. payload type number used in the offer unless the profile-id value
is exactly the same as in the original offer. However, max-fr and
max-fs parameters MAY be changed in subsequent offers and answers,
with the same payload type number, if an endpoint wishes to change
its declared receiver capabilties.
7. Security Considerations 7. Security Considerations
RTP packets using the payload format defined in this specification RTP packets using the payload format defined in this specification
are subject to the security considerations discussed in the RTP are subject to the security considerations discussed in the RTP
specification [RFC3550], and in any applicable RTP profile such as specification [RFC3550], and in any applicable RTP profile such as
RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/ RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/
SAVPF [RFC5124]. SAVPF [RFC5124]. However, as "Securing the RTP SAVPF [RFC5124]. SAVPF [RFC5124]. However, as "Securing the RTP
Protocol Framework: Why RTP Does Not Mandate a Single Media Security Protocol Framework: Why RTP Does Not Mandate a Single Media Security
Solution" [RFC7202] discusses, it is not an RTP payload format's Solution" [RFC7202] discusses, it is not an RTP payload format's
skipping to change at page 22, line 19 skipping to change at page 22, line 40
[RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
"Codec Control Messages in the RTP Audio-Visual Profile "Codec Control Messages in the RTP Audio-Visual Profile
with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104,
February 2008, <https://www.rfc-editor.org/info/rfc5104>. February 2008, <https://www.rfc-editor.org/info/rfc5104>.
[RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type
Specifications and Registration Procedures", BCP 13, Specifications and Registration Procedures", BCP 13,
RFC 6838, DOI 10.17487/RFC6838, January 2013, RFC 6838, DOI 10.17487/RFC6838, January 2013,
<https://www.rfc-editor.org/info/rfc6838>. <https://www.rfc-editor.org/info/rfc6838>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP: [RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP:
Session Description Protocol", RFC 8866, Session Description Protocol", RFC 8866,
DOI 10.17487/RFC8866, January 2021, DOI 10.17487/RFC8866, January 2021,
<https://www.rfc-editor.org/info/rfc8866>. <https://www.rfc-editor.org/info/rfc8866>.
[VP9-BITSTREAM] [VP9-BITSTREAM]
Grange, A., de Rivaz, P., and J. Hunt, "VP9 Bitstream & Grange, A., de Rivaz, P., and J. Hunt, "VP9 Bitstream &
Decoding Process Specification", Version 0.6, 31 March Decoding Process Specification", Version 0.6, 31 March
2016, 2016,
<https://storage.googleapis.com/downloads.webmproject.org/ <https://storage.googleapis.com/downloads.webmproject.org/
skipping to change at page 23, line 14 skipping to change at page 23, line 39
[RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP [RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP
Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014, Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014,
<https://www.rfc-editor.org/info/rfc7201>. <https://www.rfc-editor.org/info/rfc7201>.
[RFC7202] Perkins, C. and M. Westerlund, "Securing the RTP [RFC7202] Perkins, C. and M. Westerlund, "Securing the RTP
Framework: Why RTP Does Not Mandate a Single Media Framework: Why RTP Does Not Mandate a Single Media
Security Solution", RFC 7202, DOI 10.17487/RFC7202, April Security Solution", RFC 7202, DOI 10.17487/RFC7202, April
2014, <https://www.rfc-editor.org/info/rfc7202>. 2014, <https://www.rfc-editor.org/info/rfc7202>.
[RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
DOI 10.17487/RFC7667, November 2015,
<https://www.rfc-editor.org/info/rfc7667>.
Authors' Addresses Authors' Addresses
Justin Uberti Justin Uberti
Google, Inc. Google, Inc.
747 6th Street South 747 6th Street South
Kirkland, WA 98033 Kirkland, WA 98033
United States of America United States of America
Email: justin@uberti.name Email: justin@uberti.name
Stefan Holmer Stefan Holmer
Google, Inc. Google, Inc.
Kungsbron 2 Kungsbron 2
SE-111 22 Stockholm SE-111 22 Stockholm
Sweden Sweden
Email: holmer@google.com Email: holmer@google.com
Magnus Flodman Magnus Flodman
Google, Inc. Google, Inc.
 End of changes. 22 change blocks. 
34 lines changed or deleted 62 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/