draft-ietf-avt-rtp-mpeg4-es-03.txt   draft-ietf-avt-rtp-mpeg4-es-04.txt 
Internet Engineering Task Force Yoshihiro Kikuchi - Toshiba Internet Engineering Task Force Yoshihiro Kikuchi - Toshiba
Internet Draft Toshiyuki Nomura - NEC Internet Draft Toshiyuki Nomura - NEC
Document: draft-ietf-avt-rtp-mpeg4-es-03.txt Shigeru Fukunaga - Oki Document: draft-ietf-avt-rtp-mpeg4-es-04.txt Shigeru Fukunaga - Oki
Yoshinori Matsui - Matsushita Yoshinori Matsui - Matsushita
Hideaki Kimata - NTT Hideaki Kimata - NTT
Aug 21, 2000 September 18, 2000
RTP payload format for MPEG-4 Audio/Visual streams RTP payload format for MPEG-4 Audio/Visual streams
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with all This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026 [1]. provisions of Section 10 of RFC2026 [1].
Internet-Drafts are working documents of the Internet Engineering Task Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other groups Force (IETF), its areas, and its working groups. Note that other groups
skipping to change at page 1, line 31 skipping to change at page 1, line 31
replaced, or obsoleted by other documents at any time. It is replaced, or obsoleted by other documents at any time. It is
inappropriate to use Internet- Drafts as reference material or to cite inappropriate to use Internet- Drafts as reference material or to cite
them other than as "work in progress." them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Abstract Abstract
This document describes RTP payload formats for carrying of MPEG-4 Audio This document describes respective RTP payload formats for carrying each
and Visual bitstreams without using MPEG-4 Systems. For the purpose of of MPEG-4 Audio and MPEG-4 Visual bitstreams without using MPEG-4
directly mapping MPEG-4 Audio/Visual bitstreams onto RTP packets, it Systems. For the purpose of directly mapping MPEG-4 Audio/Visual
provides specifications for the use of RTP header fields and also bitstreams onto RTP packets, it provides specifications for the use of
specifies fragmentation rules. It also provides specifications for MIME RTP header fields and also specifies fragmentation rules. It also
type registrations and the use of SDP. provides specifications for MIME type registrations and the use of SDP.
1. Introduction 1. Introduction
The RTP payload formats described in this document specify a way of how The RTP payload formats described in this document specify a way of how
MPEG-4 Audio and Visual streams [2][3][4][5] are to be fragmented and MPEG-4 Audio [3][5] and MPEG-4 Visual streams [2][4] are to be fragmented
mapped directly onto RTP packets. and mapped directly onto RTP packets.
These RTP payload formats enable to carry MPEG-4 Audio/Visual streams These RTP payload formats enable to carry MPEG-4 Audio/Visual streams
without using the synchronization and stream management functionality of without using the synchronization and stream management functionality of
MPEG-4 Systems [6]. Such RTP payload format would be used within systems MPEG-4 Systems [6]. Such RTP payload format will be used in systems that
where their own stream management functionality is provided and thus such have intrinsic stream management functionality and thus require no such
functionality in MPEG-4 Systems is not necessary. H.323 terminals are an functionality in MPEG-4 Systems. H.323 terminals are an example of such
example of such systems. MPEG-4 Audio/Visual streams are not managed by systems. MPEG-4 Audio/Visual streams are not managed by MPEG-4 Systems
MPEG-4 Systems Object Descriptors but by H.245. The streams are directly Object Descriptors but by H.245. The streams are directly mapped onto RTP
mapped onto RTP packets without using the synchronization functionality packets without using MPEG-4 Systems Sync Layer. Other examples are SIP
of MPEG-4 Systems. Other examples are SIP and RTSP where MIME and SDP are and RTSP where MIME and SDP are used. MIME types and SDP usages of the
used. MIME types and SDP usages of the RTP payload formats described in RTP payload formats described in this document are defined to directly
this document are defined to specify the attribute of Audio/Visual specify the attribute of Audio/Visual streams (e.g. media type,
streams (e.g. media type, packetization format and codec configuration) packetization format and codec configuration) without using MPEG-4
directly without using MPEG-4 Systems. Systems. It is basically the same approach as those taken by RTP payload
formats for the existing audio/video codecs. The obvious benefit is that
these MPEG-4 Audio/Visual RTP payload formats can be handled in an
unified way together with those formats defined for non-MPEG-4 codecs.
The semantics of RTP headers in such cases need to be clearly defined, The semantics of RTP headers in such cases need to be clearly defined,
including the association with MPEG-4 Audio/Visual data elements. In including the association with MPEG-4 Audio/Visual data elements. In
addition, it would be beneficial to define the fragmentation rules of RTP addition, it would be beneficial to define the fragmentation rules of RTP
packets for MPEG-4 Video streams so as to enhance error resiliency by packets for MPEG-4 Video streams so as to enhance error resiliency by
utilizing the error resilience tools provided inside the MPEG-4 Video utilizing the error resilience tools provided inside the MPEG-4 Video
stream. These issues, however, have yet to be addressed by other RTP stream. These issues, however, have yet to be addressed by other MPEG-4
payload format specifications. RTP payload format specifications.
1.1 MPEG-4 Visual RTP payload format 1.1 MPEG-4 Visual RTP payload format
MPEG-4 Visual is a visual coding standard with many new features: high MPEG-4 Visual is a visual coding standard with many new features: high
coding efficiency; high error resiliency; multiple, arbitrary shape coding efficiency; high error resiliency; multiple, arbitrary shape
object-based coding; etc. [2]. It covers a wide range of bitrate from object-based coding; etc. [2]. It covers a wide range of bitrate from
scores of Kbps to several Mbps. It also covers a wide variety of scores of Kbps to several Mbps. It also covers a wide variety of
networks, ranging from those guaranteed to be almost error-free to mobile networks, ranging from those guaranteed to be almost error-free to mobile
networks with high error rates. networks with high error rates.
With respect to the fragmentation rules for an MPEG-4 visual bitstream With respect to the fragmentation rules for an MPEG-4 visual bitstream
defined in this document, since MPEG-4 Visual is used for a wide variety defined in this document, since MPEG-4 Visual is used for a wide variety
of networks, it is desirable not to apply too much restriction on of networks, it is desirable not to apply too much restriction on
fragmentation, and a fragmentation rule such as "a single video packet fragmentation, and a fragmentation rule such as "a single video packet
shall always be mapped on a single RTP packet" may be inappropriate. On shall always be mapped on a single RTP packet" may be inappropriate. On
the other hand, careless, media unaware fragmentation may cause the other hand, careless, media unaware fragmentation may cause
degradation in error resiliency and bandwidth efficiency. The degradation in error resiliency and bandwidth efficiency. The
fragmentation rules described in this document are flexible but manage to fragmentation rules described in this document are flexible but manage to
define the minimum rules for preventing meaningless fragmentation and for define the minimum rules for preventing meaningless fragmentation while
utilizing the error resilience of MPEG-4 Visual. utilizing the error resilience functionalities of MPEG-4 Visual.
The fragmentation rule recommends not to map more than one VOP in an RTP
packet so that RTP timestamp uniquely indicates the VOP time framing. On
the other hand, MPEG-4 video may generate VOPs of very small size, in
cases with a not coded VOP containing only VOP header or an arbitrary
shaped VOP with a small number. To reduce the overhead for such cases,
the fragmentation rule permits concatenating multiple VOPs in an RTP
packet. (See fragmentation rule (4) in section 3.2 and marker bit and
timestamp in section 3.1.)
While the additional media specific RTP header defined for such video While the additional media specific RTP header defined for such video
coding tools as H.261 or MPEG-1/2 is effective in helping to recover coding tools as H.261 or MPEG-1/2 is effective in helping to recover
picture headers corrupted by packet losses, in MPEG-4 Visual there are picture headers corrupted by packet losses, MPEG-4 Visual has already
already error resilience functionalities for recovering corrupt headers, error resilience functionalities for recovering corrupt headers, and
and these can be used on RTP/IP networks, as well as on other networks. these can be used on RTP/IP networks as well as on other networks
(H.223/mobile, MPEG-2/TS, etc.) That is why no extra RTP header fields (H.223/mobile, MPEG-2/TS, etc.). Therefore, no extra RTP header fields
are defined in the MPEG-4 Visual RTP payload format proposed here. are defined in this MPEG-4 Visual RTP payload format.
1.2 MPEG-4 Audio RTP payload format 1.2 MPEG-4 Audio RTP payload format
MPEG-4 Audio is a new kind of audio standard that integrates many MPEG-4 Audio is a new kind of audio standard that integrates many
different types of audio coding tools. It also supports a mechanism for different types of audio coding tools. It also supports a mechanism for
representing synthesized sounds. Low-overhead MPEG-4 Audio Transport representing synthesized sounds. Low-overhead MPEG-4 Audio Transport
Multiplex (LATM) manages the sequences of audio data with relatively Multiplex (LATM) manages the sequences of audio data with relatively
small overhead. In audio-only applications, then, it is desirable for small overhead. In audio-only applications, then, it is desirable for
LATM-based MPEG-4 Audio bitstreams to be directly mapped onto the RTP LATM-based MPEG-4 Audio bitstreams to be directly mapped onto the RTP
packets without using MPEG-4 Systems. packets without using MPEG-4 Systems.
While LATM has several multiplexing features as follows;
- Carrying configuration information with audio data,
- Concatenation of multiple audio frames in one audio stream,
- Multiplexing multiple objects (programs),
- Multiplexing scalable layers,
in RTP transmission there is no need for the last two features that
multiplex payloads of different objects and scalable layers into one RTP
packet. Therefore, these two features SHOULD NOT be used in applications
based on RTP packetization specified by this document.
For transmission of scalable streams, audio data of each layer should be
packetized onto different RTP packets. On the other hand, all
configuration data of the scalable streams are contained in one LATM
configuration data "StreamMuxConfig" and every scalable layer shares the
StreamMuxConfig. The mapping between each layer and its configuration
data is achieved by LATM header information attached to the audio data.
In order to indicate the dependency information of the scalable streams,
a restriction is applied to the dynamic assignment rule of payload type
(PT) values (see section 4.2).
For MPEG-4 Audio coding tools except synthesis tools, as is true for For MPEG-4 Audio coding tools except synthesis tools, as is true for
other audio coders, if the payload of a packet is a single audio frame, other audio coders, if the payload of a packet is a single audio frame,
packet loss will not impair the decodability of adjacent packets. On the packet loss will not impair the decodability of adjacent packets. On the
other hands, MPEG-4 Audio synthesis tools may be sensitive to error. For other hands, MPEG-4 Audio synthesis tools may be sensitive to error. For
example, an SA_access_unit in the payload may set a global value to a new example, an SA_access_unit in the payload may set a global value to a new
value, which is then references throughout the audio content to make a value, which is then references throughout the audio content to make a
macro change in the performance. In this case, an error in the payload macro change in the performance. In this case, an error in the payload
influences all audio data produced after the error. In order to enhance influences all audio data produced after the error. In order to enhance
error resiliency, the element of SA_access_unit that makes the above error resiliency, the element of SA_access_unit that makes the above
macro change should be transmitted across several SA_access_unit macro change should be transmitted across several SA_access_unit
skipping to change at page 4, line 10 skipping to change at page 4, line 37
port as the elementary stream. (see 6.2.1 "Start codes" of ISO/IEC 14496- port as the elementary stream. (see 6.2.1 "Start codes" of ISO/IEC 14496-
2 [2][9][4]) The configuration information MAY additionally be specified 2 [2][9][4]) The configuration information MAY additionally be specified
by some out-of-band means; in H.323 terminals, H.245 codepoint by some out-of-band means; in H.323 terminals, H.245 codepoint
"decoderConfigurationInformation" MAY be used for this purpose; in "decoderConfigurationInformation" MAY be used for this purpose; in
systems using MIME content type and SDP parameters, e.g. SIP and RTSP, systems using MIME content type and SDP parameters, e.g. SIP and RTSP,
the optional parameter "config" MAY be used to specify the configuration the optional parameter "config" MAY be used to specify the configuration
information. (see 5.1 and 5.2) information. (see 5.1 and 5.2)
When the short video header mode is used, the RTP payload format used MAY When the short video header mode is used, the RTP payload format used MAY
be that specified for H.263 in the relevant RFCs or in other relevant be that specified for H.263 in the relevant RFCs or in other relevant
standards. standards. (e.g., RFC 2190 or RFC 2429)
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| PT | sequence number | RTP |V=2|P|X| CC |M| PT | sequence number | RTP
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp | Header | timestamp | Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| synchronization source (SSRC) identifier | | synchronization source (SSRC) identifier |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| contributing source (CSRC) identifiers | | contributing source (CSRC) identifiers |
skipping to change at page 6, line 8 skipping to change at page 6, line 8
random, is added for security reasons. The detailed definition of the random, is added for security reasons. The detailed definition of the
timestamp is as follows: timestamp is as follows:
- For a video object plane, it is defined as vop_time_increment (in units - For a video object plane, it is defined as vop_time_increment (in units
of 1/vop_time_increment_resolution seconds) plus the cumulative number of 1/vop_time_increment_resolution seconds) plus the cumulative number
of whole seconds specified by modulo_time_base and, if present, of whole seconds specified by modulo_time_base and, if present,
time_code of Group_of_VideoObjectPlane() fields. time_code of Group_of_VideoObjectPlane() fields.
- In the case of interlaced video, a VOP will consist of lines from two - In the case of interlaced video, a VOP will consist of lines from two
fields, and the timestamp will indicate the composition time of the fields, and the timestamp will indicate the composition time of the
first field. first field.
- For a video object plane with short header, the timestamps (after the
first random timestamp) are equal to the presentation time sequence
associated with the semantics of the temporal_reference field.
Specifically, each timestamp value SHALL be calculated by rounding the
value of a precise clock that advances delta_time with each successive
video object plane with short header. The time increment SHOULD be
calculated as delta_time = (((temporal_reference + 256 -
(temporal_reference of previous VOP) modulo 256) * 1001/30000) for each
successive video object plane with short header. The RTP timestamp
should be consistently rounded or truncated to the resolution of the
RTP timestamp field.
- When multiple VOPs are carried in the same RTP packet, the timestamp - When multiple VOPs are carried in the same RTP packet, the timestamp
indicates the earliest of the composition time within the VOPs carried indicates the earliest of the composition times within the VOPs carried
in the RTP packet. in the RTP packet. Timestamp information of the rest of the VOPs are
derived from the timestamp fields in the VOP header (modulo_time_base
and vop_time_increment), or from the temporal_reference field in the
case of short video header.
- If the RTP packet contains only configuration information and/or - If the RTP packet contains only configuration information and/or
Group_of_VideoObjectPlane() fields, the composition time of the next Group_of_VideoObjectPlane() fields, the composition time of the next
VOP in the coding order is used. VOP in the coding order is used.
- If the RTP packet contains only visual_object_sequence_end_code - If the RTP packet contains only visual_object_sequence_end_code
information, the composition time of the immediately preceding VOP in information, the composition time of the immediately preceding VOP in
the coding order is used. the coding order is used.
The resolution of the timestamp is set to its default value of 90KHz, The resolution of the timestamp is set to its default value of 90KHz,
unless specified by an out-of-band means (e.g. SDP parameter or MIME unless specified by an out-of-band means (e.g. SDP parameter or MIME
parameter as defined in section 5). parameter as defined in section 5).
skipping to change at page 6, line 43 skipping to change at page 7, line 7
header) or just after the header of the syntactically upper layer header) or just after the header of the syntactically upper layer
function. function.
(2) If one or more headers exist in the RTP payload, the RTP payload (2) If one or more headers exist in the RTP payload, the RTP payload
SHALL begin with the header of the syntactically highest function. SHALL begin with the header of the syntactically highest function.
Note: The visual_object_sequence_end_code is regarded as the lowest Note: The visual_object_sequence_end_code is regarded as the lowest
function. function.
(3) A header SHALL NOT be split into a plurality of RTP packets. (3) A header SHALL NOT be split into a plurality of RTP packets.
(4) Two or more VOPs SHOULD be fragmented into different RTP packets so (4) Different VOPs SHOULD be fragmented into different RTP packets so
that one RTP packet consists of the data bytes associated with a unique that one RTP packet consists of the data bytes associated with a unique
presentation time (that is indicated in the timestamp field in the RTP presentation time (that is indicated in the timestamp field in the RTP
packet header), with the exception that multiple VOPs MAY be carried packet header), with the exception that more than one integral number of
within one RTP packet if the size of the VOPs is small. consecutive VOPs MAY be carried within one RTP packet in the decoding
order if the size of the VOPs is small.
Note: When multiple VOPs are carried in one RTP payload, the presentation
time of the VOPs after the first one may be calculated by the decoder.
This operation is necessary only for RTP packets in which the marker bit
equals to one and the beginning of RTP payload corresponds to a start
code. (See timestamp and marker bit in section 3.1)
(5) A single video packet SHOULD NOT be split into a plurality of RTP (5) A single video packet SHOULD NOT be split into a plurality of RTP
packets. The size of a video packet SHOULD be adjusted in such a way that packets. The size of a video packet SHOULD be adjusted in such a way that
the resulting RTP packet is not larger than the path-MTU. A video packet the resulting RTP packet is not larger than the path-MTU. A video packet
MAY be split into a plurality of RTP packets when the size of the video MAY be split into a plurality of RTP packets when the size of the video
packet is large. packet is large.
Note: Rule (5) does not apply when the video packet is disabled by the Note: Rule (5) does not apply when the video packet is disabled by the
coder configuration (by setting resync_marker_disable in the VOL header coder configuration (by setting resync_marker_disable in the VOL header
to 1), or in coding tools where the video packet is not supported. In to 1), or in coding tools where the video packet is not supported. In
this case, a VOP MAY be split at arbitrary byte-positions. this case, a VOP MAY be split at arbitrary byte-positions.
Here, header means: Here, header means:
- Configuration information (Visual Object Sequence Header, Visual Object - Configuration information (Visual Object Sequence Header, Visual Object
Header and Video Object Layer Header) Header and Video Object Layer Header)
- visual_object_sequence_end_code - visual_object_sequence_end_code
- The header of the entry point function for an elementary stream - The header of the entry point function for an elementary stream
(Group_of_VideoObjectPlane() or the header of VideoObjectPlane(), (Group_of_VideoObjectPlane() or the header of VideoObjectPlane(),
video_plane_with_short_header(), MeshObject() or FaceObject()) video_plane_with_short_header(), MeshObject() or FaceObject())
- The video packet header (video_packet_header() excluding - The video packet header (video_packet_header() excluding
next_resync_marker()) next_resync_marker())
- The header of gob_layer() - The header of gob_layer()
skipping to change at page 11, line 55 skipping to change at page 11, line 55
include an indication bit "useSameStreamMux" and MAY include the include an indication bit "useSameStreamMux" and MAY include the
configuration information for audio compression "StreamMuxConfig". The configuration information for audio compression "StreamMuxConfig". The
useSameStreamMux bit indicates whether the StreamMuxConfig element in the useSameStreamMux bit indicates whether the StreamMuxConfig element in the
previous frame is applied in the current frame. previous frame is applied in the current frame.
4.2 Use of RTP Header Fields for MPEG-4 Audio 4.2 Use of RTP Header Fields for MPEG-4 Audio
Payload Type (PT): Payload type is to be specifically assigned as the Payload Type (PT): Payload type is to be specifically assigned as the
MPEG-4 Audio RTP payload format. If this assignment is to be carried out MPEG-4 Audio RTP payload format. If this assignment is to be carried out
dynamically, it can be performed by such out-of-band means as H.245, SDP, dynamically, it can be performed by such out-of-band means as H.245, SDP,
etc. etc. In the dynamic assignment of RTP payload types for scalable streams,
a different value should be assigned to each layer. The assigned values
should be in order of enhance layer dependency, where the base layer has
the smallest value.
Marker (M) bit: The marker bit indicates audioMuxElement boundaries. It Marker (M) bit: The marker bit indicates audioMuxElement boundaries. It
is set to one to indicate that the RTP packet contains a complete is set to one to indicate that the RTP packet contains a complete
audioMuxElement or the last fragment of an audioMuxElement. audioMuxElement or the last fragment of an audioMuxElement.
Timestamp: The timestamp indicates composition time, or presentation time Timestamp: The timestamp indicates composition time, or presentation time
in a no-compositor decoder. Timestamps are recommended to start at a in a no-compositor decoder. Timestamps are recommended to start at a
random value for security reasons. random value for security reasons.
Unless specified by an out-of-band means, the resolution of the timestamp Unless specified by an out-of-band means, the resolution of the timestamp
skipping to change at page 13, line 15 skipping to change at page 13, line 20
Required parameters: none Required parameters: none
Optional parameters: Optional parameters:
rate: This parameter is used only for RTP transport. It indicates the rate: This parameter is used only for RTP transport. It indicates the
resolution of the timestamp field in the RTP header. If this parameter resolution of the timestamp field in the RTP header. If this parameter
is not specified, its default value of 90000 (90KHz) is used. is not specified, its default value of 90000 (90KHz) is used.
profile-level-id: A decimal representation of MPEG-4 Visual Profile profile-level-id: A decimal representation of MPEG-4 Visual Profile
Level indication value (profile_and_level_indication) defined in Table Level indication value (profile_and_level_indication) defined in Table
G-1 of ISO/IEC 14496-2 [2][4]. G-1 of ISO/IEC 14496-2 [2][4]. This parameter MAY be used in the
capability exchange or session setup procedure to indicate MPEG-4
Visual Profile and Level combination of which the MPEG-4 Visual codec
is capable. If this parameter is not specified by the procedure, its
default value of 1 (Simple Profile/Level 1) is used.
config: A hexadecimal representation of an octet string that expresses config: This parameter indicates the configuration of the
the MPEG-4 Visual configuration information, as defined in subclause corresponding MPEG-4 visual bitstream. It SHALL NOT be used to
6.2.1 Start codes of ISO/IEC14496-2[2][4][9]. The configuration indicate the codec capability in the capability exchange procedure. It
information is mapped onto the octet string in an MSB-first basis. The is a hexadecimal representation of an octet string that expresses the
first bit of the configuration information SHALL be located at the MSB MPEG-4 Visual configuration information, as defined in subclause 6.2.1
of the first octet. The configuration information indicated by this Start codes of ISO/IEC14496-2[2][4][9]. The configuration information
parameter SHALL be the same as the configuration information in the is mapped onto the octet string in an MSB-first basis. The first bit
of the configuration information SHALL be located at the MSB of the
first octet. The configuration information indicated by this parameter
SHALL be the same as the configuration information in the
corresponding MPEG-4 Visual stream, except for corresponding MPEG-4 Visual stream, except for
first_half_vbv_occupancy and latter_half_vbv_occupancy, if exist, first_half_vbv_occupancy and latter_half_vbv_occupancy, if exist,
which may vary in the repeated configuration information inside an which may vary in the repeated configuration information inside an
MPEG-4 Visual stream (See 6.2.1 Start codes of ISO/IEC14496-2). MPEG-4 Visual stream (See 6.2.1 Start codes of ISO/IEC14496-2).
The parameter "profile-level-id" MAY be used in the capability
exchange/announcement procedure to indicate MPEG-4 Visual Profile and
Level combination of which the MPEG-4 Visual codec is capable. The
parameter "config" MAY be used to indicate the configuration of the
corresponding MPEG-4 visual bitstream, but SHALL NOT be used to
indicate the codec capability in the capability exchange procedure.
Example usages for these parameters are: Example usages for these parameters are:
- MPEG-4 Visual Simple Profile/Level 1: - MPEG-4 Visual Simple Profile/Level 1:
Content-type: video/mp4v; profile-level-id=1 Content-type: video/mp4v; profile-level-id=1
- MPEG-4 Visual Core Profile/Level 2: - MPEG-4 Visual Core Profile/Level 2:
Content-type: video/mp4v; profile-level-id=34 Content-type: video/mp4v; profile-level-id=34
- MPEG-4 Visual Advanced Real Time Simple Profile/Level 1: - MPEG-4 Visual Advanced Real Time Simple Profile/Level 1:
Content-type: video/mp4v; profile-level-id=145 Content-type: video/mp4v; profile-level-id=145
skipping to change at page 15, line 20 skipping to change at page 15, line 24
"a=fmtp" line to indicate the coder capability and configuration, "a=fmtp" line to indicate the coder capability and configuration,
respectively. These parameters are expressed as a MIME media type string, respectively. These parameters are expressed as a MIME media type string,
in the form of as a semicolon separated list of parameter=value pairs. in the form of as a semicolon separated list of parameter=value pairs.
The following are some examples of media representation in SDP: The following are some examples of media representation in SDP:
Simple Profile/Level 1, rate=90000(90KHz), "profile-level-id" and Simple Profile/Level 1, rate=90000(90KHz), "profile-level-id" and
"config" are present in "a=fmtp" line: "config" are present in "a=fmtp" line:
m=video 49170/2 RTP/AVP 98 m=video 49170/2 RTP/AVP 98
a=rtpmap:98 MP4V/90000 a=rtpmap:98 MP4V/90000
a=fmtp:98 profile-level-id=1; a=fmtp:98 profile-level-id=1;config=000001B001000001B50900000100
config=000001B001000001B5090000010000000120008440FA282C2090A21F 00000120008440FA282C2090A21F
Core Profile/Level 2, rate=90000(90KHz), "profile-level-id" is present in Core Profile/Level 2, rate=90000(90KHz), "profile-level-id" is present in
"a=fmtp" line: "a=fmtp" line:
m=video 49170/2 RTP/AVP 98 m=video 49170/2 RTP/AVP 98
a=rtpmap:98 MP4V/90000 a=rtpmap:98 MP4V/90000
a=fmtp:98 profile-level-id=34 a=fmtp:98 profile-level-id=34
Advance Real Time Simple Profile/Level 1, rate=25(25Hz), "profile-level- Advance Real Time Simple Profile/Level 1, rate=25(25Hz), "profile-level-
id" is present in "a=fmtp" line: id" is present in "a=fmtp" line:
m=video 49170/2 RTP/AVP 98 m=video 49170/2 RTP/AVP 98
skipping to change at page 15, line 48 skipping to change at page 16, line 4
MIME subtype name: MP4A MIME subtype name: MP4A
Required parameters: Required parameters:
rate: the rate parameter indicates the RTP time stamp clock rate. The rate: the rate parameter indicates the RTP time stamp clock rate. The
default value is 90000. Other rates CAN be specified only if they are default value is 90000. Other rates CAN be specified only if they are
set to the same value as the audio sampling rate (number of samples set to the same value as the audio sampling rate (number of samples
per second). per second).
Optional parameters: Optional parameters:
profile-level-id: a decimal representation of MPEG-4 Audio Profile profile-level-id: a decimal representation of MPEG-4 Audio Profile
Level indication value defined in ISO/IEC 14496-1 [11]. This parameter Level indication value defined in ISO/IEC 14496-1 [10]. This parameter
indicates which MPEG-4 Audio tool subsets the decoder is capable of indicates which MPEG-4 Audio tool subsets the decoder is capable of
using. using. If this parameter is not specified in the capability exchange
or session setup procedure, its default value of 30 (Natural Audio
Profile/Level 1) is used.
object: a decimal representation of the MPEG-4 Audio Object Type value object: a decimal representation of the MPEG-4 Audio Object Type value
defined in ISO/IEC 14496-3 [5]. This parameter specifies the tool to defined in ISO/IEC 14496-3 [5]. This parameter specifies the tool to
be used by the coder. It CAN be used to limit the capability within be used by the coder. It CAN be used to limit the capability within
the specified "profile-level-id". the specified "profile-level-id".
bitrate: the data rate for the audio bit stream. bitrate: the data rate for the audio bit stream.
cpresent: this parameter indicates whether audio payload configuration cpresent: this parameter indicates whether audio payload configuration
data has been multiplexed into an RTP payload (See section 4.1 in this data has been multiplexed into an RTP payload (See section 4.1 in this
document). document). The default value is 1.
config: a hexadecimal representation of an octet string that expresses config: a hexadecimal representation of an octet string that expresses
the audio payload configuration data "StreamMuxConfig", as defined in the audio payload configuration data "StreamMuxConfig", as defined in
ISO/IEC 14496-3 [5]. Configuration data is mapped onto the octet ISO/IEC 14496-3 [5]. Configuration data is mapped onto the octet
string in an MSB-first basis. The first bit of the configuration data string in an MSB-first basis. The first bit of the configuration data
SHALL be located at the MSB of the first octet. In the last octet, SHALL be located at the MSB of the first octet. In the last octet,
zero-padding bits, if necessary, shall follow the configuration data. zero-padding bits, if necessary, shall follow the configuration data.
If the size of the configuration data is quite large, such large If the size of the configuration data is quite large, such large
config data is RECOMMENDED to be indicated by in-band mode (cpresent config data is RECOMMENDED to be indicated by in-band mode (cpresent
is set to 1). is set to 1).
skipping to change at page 17, line 45 skipping to change at page 17, line 50
semicolon separated list of parameter=value pairs. semicolon separated list of parameter=value pairs.
The following are some examples of the media representation in SDP: The following are some examples of the media representation in SDP:
For 6 kb/s CELP bitstreams (with an audio sampling rate of 8 kHz), For 6 kb/s CELP bitstreams (with an audio sampling rate of 8 kHz),
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 MP4A/8000 a=rtpmap:96 MP4A/8000
a=fmtp:96 profile-level-id=9;object=8;cpresent=0;config=9128B1071070 a=fmtp:96 profile-level-id=9;object=8;cpresent=0;config=9128B1071070
a=ptime:20 a=ptime:20
For 64 kb/s AAC LC stereo bitstreams with For 64 kb/s AAC LC stereo bitstreams (with an audio sampling rate of 24
( an audio sampling rate of 24
kHz), kHz),
m=audio 49230 RTP/AVP 96 m=audio 49230 RTP/AVP 96
a=rtpmap:96 MP4A/24000 a=rtpmap:96 MP4A/24000
a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0; a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0;
config=9122620000 config=9122620000
In the above two examples, audio configuration data is not multiplexed In the above two examples, audio configuration data is not multiplexed
into the RTP payload and is described only in SDP. Furthermore, the into the RTP payload and is described only in SDP. Furthermore, the
"clock rate" is set to the audio sampling rate. "clock rate" is set to the audio sampling rate.
skipping to change at page 19, line 12 skipping to change at page 19, line 19
6 ISO/IEC 14496-1:1999, "Information technology - Coding of audio-visual 6 ISO/IEC 14496-1:1999, "Information technology - Coding of audio-visual
objects - Part1: Systems", December 1999. objects - Part1: Systems", December 1999.
7 Bradner, S., "Key words for use in RFCs to Indicate Requirement 7 Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997 Levels", BCP 14, RFC 2119, March 1997
8 H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson "RTP: A Transport 8 H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson "RTP: A Transport
Protocol for Real Time Applications", RFC 1889, Internet Engineering Protocol for Real Time Applications", RFC 1889, Internet Engineering
Task Force, January 1996. Task Force, January 1996.
9 ISO/IEC 14496-2/COR1, "Information technology - Coding of audio-visual 9 ISO/IEC 14496-2:1999/COR1:2000, "Information technology - Coding of
objects - Part2: Visual, Technical corrigendum 1", March 2000. audio-visual objects - Part2: Visual, Technical corrigendum 1", August
2000.
10 ISO/IEC 14496-1:1999/FDAM1:2000, December 1999.
8. Author's Addresses 8. Author's Addresses
Yoshihiro Kikuchi Yoshihiro Kikuchi
Toshiba corporation Toshiba corporation
1, Komukai Toshiba-cho, Saiwai-ku, Kawasaki, 212-8582, Japan 1, Komukai Toshiba-cho, Saiwai-ku, Kawasaki, 212-8582, Japan
Email: yoshihiro.kikuchi@toshiba.co.jp Email: yoshihiro.kikuchi@toshiba.co.jp
Yoshinori Matsui Yoshinori Matsui
Matsushita Electric Industrial Co., LTD. Matsushita Electric Industrial Co., LTD.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/