draft-ietf-avt-jpeg-02.txt   draft-ietf-avt-jpeg-03.txt 
Internet Engineering Task Force Audio-Video Transport Working Group Internet Engineering Task Force Audio-Video Transport Working Group
INTERNET-DRAFT L. Berc INTERNET-DRAFT L. Berc
draft-ietf-avt-jpeg-02.txt Digital Equipment Corporation draft-ietf-avt-jpeg-03.txt Digital Equipment Corporation
W. Fenner W. Fenner
Xerox PARC Xerox PARC
R. Frederick R. Frederick
Xerox PARC Xerox PARC
S. McCanne S. McCanne
Lawrence Berkeley Laboratory Lawrence Berkeley Laboratory
November 21, 1995 July 7, 1996
Expires: 5/1/96 Expires: 1/1/97
RTP Payload Format for JPEG-compressed Video RTP Payload Format for JPEG-compressed Video
Status of this Memo Status of this Memo
This document is an Internet Draft. Internet Drafts are working docu- This document is an Internet Draft. Internet Drafts are working docu-
ments of the Internet Engineering Task Force (IETF), its Areas, and ments of the Internet Engineering Task Force (IETF), its Areas, and
its Working Groups. Note that other groups may also distribute its Working Groups. Note that other groups may also distribute
working documents as Internet Drafts). working documents as Internet Drafts).
skipping to change at page 4, line 19 skipping to change at page 4, line 19
set in the last packet of a frame. set in the last packet of a frame.
3.1. JPEG header 3.1. JPEG header
A special header is added to each packet that immediately follows the A special header is added to each packet that immediately follows the
RTP header: RTP header:
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MBZ | Fragment Offset | | Type specific | Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Q | Width | Height | | Type | Q | Width | Height |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
3.1.1. MBZ: 8 bits 3.1.1. Type specific: 8 bits
This field is reserved for future use and must be zero. Interpretation depends on the value of the type field.
3.1.2. Fragment Offset: 24 bits 3.1.2. Fragment Offset: 24 bits
The Fragment Offset is the data offset in bytes of the current The Fragment Offset is the data offset in bytes of the current
packet in the JPEG scan. packet in the JPEG scan.
3.1.3. Type: 8 bits 3.1.3. Type: 8 bits
The type field specifies the information that would otherwise be The type field specifies the information that would otherwise be
present in a JPEG abbreviated table-specification as well as the present in a JPEG abbreviated table-specification as well as the
skipping to change at page 5, line 24 skipping to change at page 5, line 24
inferred from the RTP/JPEG header. The scan is terminated either inferred from the RTP/JPEG header. The scan is terminated either
implicitly (i.e., the point at which the image is fully parsed), or implicitly (i.e., the point at which the image is fully parsed), or
explicitly with an EOI marker. The scan may be padded to arbitrary explicitly with an EOI marker. The scan may be padded to arbitrary
length with undefined bytes. (Existing hardware codecs generate length with undefined bytes. (Existing hardware codecs generate
extra lines at the bottom of a video frame and removal of these extra lines at the bottom of a video frame and removal of these
lines would require a Huffman-decoding pass over the data.) lines would require a Huffman-decoding pass over the data.)
As defined by JPEG, restart markers are the only type of marker As defined by JPEG, restart markers are the only type of marker
that may appear embedded in the entropy-coded segment. The ``type that may appear embedded in the entropy-coded segment. The ``type
code'' determines whether a restart interval is defined, and there- code'' determines whether a restart interval is defined, and there-
fore whether restart markers may be present, but none of the fore whether restart markers may be present. It also determines if
current codes permit them. Hardware JPEG implementations that can- the restart intervals will be aligned with RTP packets, allowing for
not tolerate such markers are known to exist. partial decode of frames, thus increasing resiliance to packet
drop. If restart markers are present, the 6-byte DRI segment (define
restart interval marker [1, Sec. B.2.4.4] precedes the scan).
JPEG markers appear explicitly on byte aligned boundaries beginning JPEG markers appear explicitly on byte aligned boundaries beginning
with an 0xFF. A ``stuffed'' 0x00 byte follows any 0xFF byte gen- with an 0xFF. A ``stuffed'' 0x00 byte follows any 0xFF byte gen-
erated by the entropy coder [1, Sec. B.1.1.5]. erated by the entropy coder [1, Sec. B.1.1.5].
4. Discussion 4. Discussion
4.1. The Type Field 4.1. The Type Field
The Type field defines the abbreviated table-specification and addi- The Type field defines the abbreviated table-specification and addi-
tional JFIF-style parameters not defined by JPEG, since they are not tional JFIF-style parameters not defined by JPEG, since they are not
present in the body of the transmitted JPEG data. The Type field must present in the body of the transmitted JPEG data. The Type field must
remain constant for the duration of a session. remain constant for the duration of a session.
Two type codes are currently defined. They both correspond to an abbre- Six type codes are currently defined. They correspond to an abbre-
viated table-specification indicating the ``Baseline DCT sequential'' viated table-specification indicating the ``Baseline DCT sequential''
mode, 8-bit samples, square pixels, no restart interval, three com- mode, 8-bit samples, square pixels, three components in the YUV color
ponents in the YUV color space, standard Huffman tables as defined in space, standard Huffman tables as defined in [1, Annex K.3], and a
[1, Annex K.3], and a single interleaved scan with a scan component single interleaved scan with a scan component selector indicating
selector indicating components 0, 1, and 2 in that order. The Y, U, and components 0, 1, and 2 in that order. The Y, U, and V color planes
V color planes correspond to component numbers 0, 1, and 2, respec- correspond to component numbers 0, 1, and 2, respectively. Component
tively. Component 0 (i.e., the luminance plane) uses Huffman table 0 (i.e., the luminance plane) uses Huffman table number 0 and quantiz-
number 0 and quantization table number 0 (defined below) and components ation table number 0 (defined below) and components 1 and 2 (i.e.,
1 and 2 (i.e., the chrominance planes) use Huffman table number 1 and the chrominance planes) use Huffman table number 1 and quantization
quantization table number 1 (defined below). table number 1 (defined below).
Additionally, video is non-interlaced and unscaled (i.e., the aspect Additionally, video is non-interlaced and unscaled (i.e., the aspect
ratio is determined by the image width and height). The frame rate is ratio is determined by the image width and height). The frame rate is
variable and explicit via the RTP timestamp. variable and explicit via the RTP timestamp.
Two RTP/JPEG types are currently defined that assume all of the above Six RTP/JPEG types are currently defined that assume all of the above.
and differ only in their JPEG sampling factors: The odd types have different JPEG sampling factors from the even ones:
horizontal vertical horizontal vertical
type comp samp. fact. samp. fact. types comp samp. fact. samp. fact.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0 | 0 | 2 | 1 | | 0/2/4 | 0 | 2 | 1 |
| 0 | 1 | 1 | 1 | | 0/2/4 | 1 | 1 | 1 |
| 0 | 2 | 1 | 1 | | 0/2/4 | 2 | 1 | 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 1 | 0 | 2 | 2 | | 1/3/5 | 0 | 2 | 2 |
| 1 | 1 | 1 | 1 | | 1/3/5 | 1 | 1 | 1 |
| 1 | 2 | 1 | 1 | | 1/3/5 | 2 | 1 | 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
These sampling factors indicate that the chromanince components of type These sampling factors indicate that the chromanince components of type
0 video is downsampled horizontally by 2 (often called 4:2:2) while the 0/2/4 video is downsampled horizontally by 2 (often called 4:2:2) while
chrominance components of type 1 video are downsampled both horizontally the chrominance components of type 1/3/5 video are downsampled both
and vertically by 2 (often called 4:2:0). horizontally and vertically by 2 (often called 4:2:0).
The three pairs of types (0/1), (2/3) and (4/5) differ from each other
as follows:
0/1 : No restart markers are present in the entropy data.
No restriction is placed on the fragmentation of the stream into
RTP packets.
The type specific field is unused and must be zero.
2/3 : Restart markers are present in the entropy data.
The entropy data is preceded by a DRI marker segment, defining the
restart interval.
No restriction is placed on the fragmentation of the stream into
RTP packets.
The type specific field is unused and must be zero.
4/5 : Restart markers are present in the entropy data.
The entropy data is preceded by a DRI marker segment, defining the
restart interval.
Restart intervals are be sent as separate (possibly multiple) RTP
packets.
The type specific field (TSPEC) is used as follows:
A restart interval count (RCOUNT) is defined, which starts at
zero, and is incremented for each restart interval in the
frame.
The first packet of a restart interval gets TSPEC = RCOUNT.
Subsequent packets of the restart interval get TSPEC = 254,
except the final packet, which gets TSPEC = 255.
Additional types in the range 128-255 may be defined by external means, Additional types in the range 128-255 may be defined by external means,
such as a session protocol. such as a session protocol.
Appendix B contains C source code for transforming the RTP/JPEG header Appendix B contains C source code for transforming the RTP/JPEG header
parameters into the JPEG frame and scan headers that are absent from the parameters into the JPEG frame and scan headers that are absent from the
data payload. data payload.
4.2. The Q Field 4.2. The Q Field
skipping to change at page 7, line 37 skipping to change at page 8, line 18
each packet. Reassembly could be carried out without the offset field each packet. Reassembly could be carried out without the offset field
(i.e., using only the RTP marker bit and sequence numbers), but an effi- (i.e., using only the RTP marker bit and sequence numbers), but an effi-
cient single-copy implementation would not otherwise be possible in the cient single-copy implementation would not otherwise be possible in the
presence of misordered packets. Moreover, if the last packet of the presence of misordered packets. Moreover, if the last packet of the
previous frame (containing the marker bit) were dropped, then a receiver previous frame (containing the marker bit) were dropped, then a receiver
could not detect that the current frame is entirely intact. could not detect that the current frame is entirely intact.
4.4. Restart Markers 4.4. Restart Markers
Restart markers indicate a point in the JPEG stream at which the Huffman Restart markers indicate a point in the JPEG stream at which the Huffman
coder is reset, allowing partial decoding starting at that point. The codec and DC predictors are reset, allowing partial decoding starting
use of restart markers allows for robustness in the face of packet loss. at that point. The use of restart markers allows for robustness in the
However, not all hardware decoders support restart markers, meaning that face of packet loss.
such hardware will only be able to decode the first portion of a frame,
up to a restart marker, and then fail. Thus, for maximum interoperabil- RTP/JPEG Types 4/5 allow for partial decode of frames, due to the
ity, restart markers may not be present in type 0 and type 1 RTP/JPEG alignment of restart intervals with RTP packets. The decoder knows it
data. has a whole restart interval when it gets sequence of packets with
contiguous RTP sequence numbers, starting with TSPEC<254 (RCOUNT) and
either ending with TSPEC==255, or TSPEC<255 and next packet's TSPEC<254
(or end of frame).
It can then decompress the RST interval, and paint it. The X and Y tile
offsets of the first MCU in the interval are given by:-
tile_offset = RCOUNT * restart_interval * 2
x_offset = tile_offset % frame_width_in_tiles
y_offset = tile_offset / frame_width_in_tiles
The MCUs in a restart interval may span multiple tile rows.
Decoders can, however, treat types 4/5 as types 2/3, simply reassembling
the entire frame and then decoding.
5. Security Considerations 5. Security Considerations
Security issues are not discussed in this memo. Security issues are not discussed in this memo.
6. Authors' Addresses 6. Authors' Addresses
Lance M. Berc Lance M. Berc
Systems Research Center Systems Research Center
Digital Equipment Corporation Digital Equipment Corporation
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/