draft-ietf-ltru-registry-03.txt   draft-ietf-ltru-registry-04.txt 
Network Working Group A. Phillips, Ed. Network Working Group A. Phillips, Ed.
Internet-Draft Quest Software Internet-Draft Quest Software
Expires: December 4, 2005 M. Davis, Ed. Expires: December 5, 2005 M. Davis, Ed.
IBM IBM
June 02, 2005 June 03, 2005
Tags for Identifying Languages Tags for Identifying Languages
draft-ietf-ltru-registry-03 draft-ietf-ltru-registry-04
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on December 4, 2005. This Internet-Draft will expire on December 5, 2005.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2005). Copyright (C) The Internet Society (2005).
Abstract Abstract
This document describes the structure, content, construction, and This document describes the structure, content, construction, and
semantics of language tags for use in cases where it is desirable to semantics of language tags for use in cases where it is desirable to
indicate the language used in an information object. It also indicate the language used in an information object. It also
describes how to register values for use in language tags and the describes how to register values for use in language tags and the
creation of user defined extensions for private interchange. This creation of user defined extensions for private interchange. This
document obsoletes RFC 3066 (which replaced RFC 1766). document obsoletes RFC 3066 (which replaced RFC 1766).
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. The Language Tag . . . . . . . . . . . . . . . . . . . . . . . 4 2. The Language Tag . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Length Considerations . . . . . . . . . . . . . . . . 6 2.1.1 Length Considerations . . . . . . . . . . . . . . . . 6
2.2 Language Subtag Sources and Interpretation . . . . . . . . 7 2.2 Language Subtag Sources and Interpretation . . . . . . . . 8
2.2.1 Primary Language Subtag . . . . . . . . . . . . . . . 9 2.2.1 Primary Language Subtag . . . . . . . . . . . . . . . 9
2.2.2 Extended Language Subtags . . . . . . . . . . . . . . 11 2.2.2 Extended Language Subtags . . . . . . . . . . . . . . 11
2.2.3 Script Subtag . . . . . . . . . . . . . . . . . . . . 11 2.2.3 Script Subtag . . . . . . . . . . . . . . . . . . . . 12
2.2.4 Region Subtag . . . . . . . . . . . . . . . . . . . . 12 2.2.4 Region Subtag . . . . . . . . . . . . . . . . . . . . 13
2.2.5 Variant Subtags . . . . . . . . . . . . . . . . . . . 13 2.2.5 Variant Subtags . . . . . . . . . . . . . . . . . . . 14
2.2.6 Extension Subtags . . . . . . . . . . . . . . . . . . 14 2.2.6 Extension Subtags . . . . . . . . . . . . . . . . . . 15
2.2.7 Private Use Subtags . . . . . . . . . . . . . . . . . 15 2.2.7 Private Use Subtags . . . . . . . . . . . . . . . . . 16
2.2.8 Pre-Existing RFC 3066 Registrations . . . . . . . . . 16 2.2.8 Pre-Existing RFC 3066 Registrations . . . . . . . . . 17
2.2.9 Classes of Conformance . . . . . . . . . . . . . . . . 16 2.2.9 Classes of Conformance . . . . . . . . . . . . . . . . 17
3. Registry Format and Maintenance . . . . . . . . . . . . . . . 18 3. Registry Format and Maintenance . . . . . . . . . . . . . . . 19
3.1 Format of the IANA Language Subtag Registry . . . . . . . 18 3.1 Format of the IANA Language Subtag Registry . . . . . . . 19
3.2 Maintenance of the Registry . . . . . . . . . . . . . . . 23 3.2 Maintenance of the Registry . . . . . . . . . . . . . . . 24
3.3 Stability of IANA Registry Entries . . . . . . . . . . . . 24 3.3 Stability of IANA Registry Entries . . . . . . . . . . . . 25
3.4 Registration Procedure for Subtags . . . . . . . . . . . . 27 3.4 Registration Procedure for Subtags . . . . . . . . . . . . 28
3.5 Possibilities for Registration . . . . . . . . . . . . . . 30 3.5 Possibilities for Registration . . . . . . . . . . . . . . 31
3.6 Extensions and Extensions Namespace . . . . . . . . . . . 32 3.6 Extensions and Extensions Namespace . . . . . . . . . . . 33
3.7 Initialization of the Registry . . . . . . . . . . . . . . 35 3.7 Initialization of the Registry . . . . . . . . . . . . . . 36
4. Formation and Processing of Language Tags . . . . . . . . . . 38 4. Formation and Processing of Language Tags . . . . . . . . . . 39
4.1 Choice of Language Tag . . . . . . . . . . . . . . . . . . 38 4.1 Choice of Language Tag . . . . . . . . . . . . . . . . . . 39
4.2 Meaning of the Language Tag . . . . . . . . . . . . . . . 40 4.2 Meaning of the Language Tag . . . . . . . . . . . . . . . 41
4.3 Canonicalization of Language Tags . . . . . . . . . . . . 41 4.3 Canonicalization of Language Tags . . . . . . . . . . . . 42
4.4 Considerations for Private Use Subtags . . . . . . . . . . 43 4.4 Considerations for Private Use Subtags . . . . . . . . . . 44
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 44 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 45
6. Security Considerations . . . . . . . . . . . . . . . . . . . 45 6. Security Considerations . . . . . . . . . . . . . . . . . . . 46
7. Character Set Considerations . . . . . . . . . . . . . . . . . 46 7. Character Set Considerations . . . . . . . . . . . . . . . . . 47
8. Changes from RFC 3066 . . . . . . . . . . . . . . . . . . . . 47 8. Changes from RFC 3066 . . . . . . . . . . . . . . . . . . . . 48
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 51 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 52
9.1 Normative References . . . . . . . . . . . . . . . . . . . 51 9.1 Normative References . . . . . . . . . . . . . . . . . . . 52
9.2 Informative References . . . . . . . . . . . . . . . . . . 52 9.2 Informative References . . . . . . . . . . . . . . . . . . 53
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 53 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 54
A. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 54 A. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 55
B. Examples of Language Tags (Informative) . . . . . . . . . . . 55 B. Examples of Language Tags (Informative) . . . . . . . . . . . 56
C. Example Registry . . . . . . . . . . . . . . . . . . . . . . . 58 C. Example Registry . . . . . . . . . . . . . . . . . . . . . . . 59
Intellectual Property and Copyright Statements . . . . . . . . 62 Intellectual Property and Copyright Statements . . . . . . . . 63
1. Introduction 1. Introduction
Human beings on our planet have, past and present, used a number of Human beings on our planet have, past and present, used a number of
languages. There are many reasons why one would want to identify the languages. There are many reasons why one would want to identify the
language used when presenting or requesting information. language used when presenting or requesting information.
Information about a user's language preferences commonly needs to be Information about a user's language preferences commonly needs to be
identified so that appropriate processing can be applied. For identified so that appropriate processing can be applied. For
example, the user's language preferences in a browser can be used to example, the user's language preferences in a browser can be used to
select web pages appropriately. A choice of language preference can select web pages appropriately. A choice of language preference can
also be used to select among tools (such as dictionaries) to assist also be used to select among tools (such as dictionaries) to assist
in the processing or understanding of content in different languages. in the processing or understanding of content in different languages.
In addition, knowledge about the particular language used by some In addition, knowledge about the particular language used by some
piece of information content may be useful or even required by some piece of information content might be useful or even required by some
types of information processing; for example spell-checking, types of information processing; for example spell-checking,
computer-synthesized speech, Braille transcription, or high-quality computer-synthesized speech, Braille transcription, or high-quality
print renderings. print renderings.
One means of indicating the language used is by labeling the One means of indicating the language used is by labeling the
information content with a language identifier. These identifiers information content with a language identifier. These identifiers
can also be used to specify user preferences when selecting can also be used to specify user preferences when selecting
information content, or for labeling additional attributes of content information content, or for labeling additional attributes of content
and associated resources. and associated resources.
skipping to change at page 5, line 41 skipping to change at page 5, line 41
grandfathered = 1*3ALPHA 1*2("-" (2*8alphanum)) grandfathered = 1*3ALPHA 1*2("-" (2*8alphanum))
; grandfathered registration ; grandfathered registration
; Note: i is the only singleton ; Note: i is the only singleton
; that starts a grandfathered tag ; that starts a grandfathered tag
alphanum = (ALPHA / DIGIT) ; letters and numbers alphanum = (ALPHA / DIGIT) ; letters and numbers
Figure 1: Language Tag ABNF Figure 1: Language Tag ABNF
The character "-" is HYPHEN-MINUS (ABNF: %x2D). All subtags have a The character "-" is HYPHEN-MINUS (ABNF: %x2D). All subtags have a
maximum length of eight characters. Note that there is a subtlety in maximum length of eight characters. Note that there is a subtlety in
the ABNF for 'variant': variants starting with a digit may be only the ABNF for 'variant': variants starting with a digit MAY be four
four characters long, while those starting with a letter must be at characters long, while those starting with a letter MUST be at least
least five characters long. five characters long.
Whitespace is not permitted in a language tag. For examples of Whitespace is not permitted in a language tag. For examples of
language tags, see Appendix B. language tags, see Appendix B.
Note that although [7] refers to octets, the language tags described Note that although [7] refers to octets, the language tags described
in this document are sequences of characters from the US-ASCII in this document are sequences of characters from the US-ASCII
repertoire. Language tags may be used in documents and applications repertoire. Language tags MAY be used in documents and applications
that use other encodings, so long as these encompass the US-ASCII that use other encodings, so long as these encompass the US-ASCII
repertoire. An example of this would be an XML document that uses repertoire. An example of this would be an XML document that uses
the UTF-16LE [12] encoding of Unicode [20]. the UTF-16LE [12] encoding of Unicode [21].
The tags and their subtags, including private-use and extensions, are The tags and their subtags, including private-use and extensions, are
to be treated as case insensitive: there exist conventions for the to be treated as case insensitive: there exist conventions for the
capitalization of some of the subtags, but these should not be taken capitalization of some of the subtags, but these MUST not be taken to
to carry meaning. carry meaning.
For example: For example:
o [ISO 639] [1] recommends that language codes be written in lower o [ISO 639] [1] recommends that language codes be written in lower
case ('mn' Mongolian). case ('mn' Mongolian).
o [ISO 3166] [4] recommends that country codes be capitalized ('MN' o [ISO 3166] [4] recommends that country codes be capitalized ('MN'
Mongolia). Mongolia).
o [ISO 15924] [3] recommends that script codes use lower case with o [ISO 15924] [3] recommends that script codes use lower case with
skipping to change at page 6, line 32 skipping to change at page 6, line 32
However, in the tags defined by this document, the uppercase US-ASCII However, in the tags defined by this document, the uppercase US-ASCII
letters in the range 'A' through 'Z' are considered equivalent and letters in the range 'A' through 'Z' are considered equivalent and
mapped directly to their US-ASCII lowercase equivalents in the range mapped directly to their US-ASCII lowercase equivalents in the range
'a' through 'z'. Thus the tag "mn-Cyrl-MN" is not distinct from "MN- 'a' through 'z'. Thus the tag "mn-Cyrl-MN" is not distinct from "MN-
cYRL-mn" or "mN-cYrL-Mn" (or any other combination) and each of these cYRL-mn" or "mN-cYrL-Mn" (or any other combination) and each of these
variations conveys the same meaning: Mongolian written in the variations conveys the same meaning: Mongolian written in the
Cyrillic script as used in Mongolia. Cyrillic script as used in Mongolia.
2.1.1 Length Considerations 2.1.1 Length Considerations
Although neither the ABNF nor other guidelines in this document RFC 3066 [24] did not provide an upper limit on the size of language
provide a fixed upper limit on the number of subtags in a Language tags. While RFC 3066 did define the semantics of particular subtags
Tag (and thus the upper bound on the size of a tag) and it is in such a way that most language tags consisted of language and
possible to envision quite long and complex subtag sequences, in region subtags with a combined total length of up to six characters,
practice these are rare because additional granularity in tags seldom much larger registered tags were not only possible but were actually
adds useful distinguishing information and because longer, more registered.
granular tags interefere with the meaning, understanding, and
processing of language tags.
A conformant implementation MAY refuse to support the storage of Neither this document nor the syntax in the ANBF imposes a fixed
language tags which exceed a specified length. For an example, see upper limit on the number of subtags in a language tag (and thus an
[RFC 2231] [22]. Any such limitation SHOULD be clearly documented, upper bound on the size of a tag). The syntax in this document
and such documentation SHOULD include the disposition of any longer suggests that, depending on the specific language, more subtags (and
tags (for example, whether an error value is generated or the thus characters) are sometimes necessary to form a complete tag; thus
language tag is truncated). If truncation is permitted it MUST NOT it is possible to envision long or complex subtag sequences.
permit a subtag to be divided. Implementations that restrict storage
should consider removing extensions before processing. A protocol
that allows tags to be truncated at an arbitrary limit, without
giving any indication of what that limit is, has the potential for
causing harm by changing the meaning of tags in substantial ways.
In particular, variant subtags SHOULD be used only with their Some applications and protocols are forced to allocate fixed buffer
recommended prefix. In practice, this limits most tags to a sequence sizes or otherwise limit the length of a language tag in a particular
of four subtags, and thus a maximum length of 26 characters application. A conformant implementation or specification MAY refuse
(excluding any extensions or private use sequences). This is because to support the storage of language tags which exceed a specified
subtags are limited to a length of eight characters and the extlang, length. Any such limitation SHOULD be clearly documented, and such
script, and region subtags are limited to even fewer characters. See documentation SHOULD include the disposition of any longer tags (for
Section 4.1 for more information on selecting the most appropriate example, whether an error value is generated or the language tag is
Language Tag. truncated).
Longer tags are possible. The longest tags (excluding extensions) In practice, most tags do not require additional subtags or
could have a length of up to 62 characters, as shown below. substantially more characters. Additional subtags sometimes add
Implementations MUST be able to handle tags of this length without useful distinguishing information, but extraneous subtags interfere
truncation. Support for tags of up to 64 characters is RECOMMENDED. with the meaning, understanding, and processing of language tags.
Implementations MAY support longer tags. Since language tags MAY be truncated by an application or protocol
that limits tag sizes, when choosing language tags users and
applications SHOULD avoid adding subtags that add no distinguishing
value. In particular, users and implementations SHOULD follow the
'Prefix' and 'Suppress-Script' fields in the registry (defined in
Section 3.1): these fields provide guidance on when specific
additional subtags SHOULD (and SHOULD NOT) be used in a language tag.
(For more information on selecting subtags, see Section 4.1.)
Here is how the 62-character length of the longest practical tag Implementations MUST support a limit of at least 33 characters. This
(excluding extensions) is derived: limit includes at least one subtag of each non-extension, non-private
use type. When choosing a buffer limit, a length of at least 42
characters is strongly RECOMMENDED.
language = 3 If truncation is permitted it MUST NOT permit a subtag to be divided
extlang1 = 4 or the formation of invalid tags (for example, one ending with the
"-" character). A protocol that allows tags to be truncated at an
arbitrary limit, without giving any indication of what that limit is,
has the potential for causing harm by changing the meaning of tags in
substantial ways.
Some specifications are space constrained but do not have a fixed
length limitation. For example, see [RFC 2231] [23]. This protocol
has no explicit length limitation: the language tag's length is
limited by the length of other header components (such as the
charset's name) coupled with the 78 character limit in [RFC 2822]
[14]. Thus the "limit" might be 60 or more characters, but it could
potentially be quite small. In these cases, implementations SHOULD
use the longest possible language tag. Warning the user of
truncation, if necessary, is RECOMMENDED, as truncation can change
the semantic meaning of the tag.
The following illustration shows how the 42-character recommendation
was derived. The combination of language and extended language
subtags was chosen for future compatibility. At up to 11 characters,
this combination is longer than the longest possible language subtag
(8 characters):
language = 3 (ISO 639-2; ISO 639-1 requires 2)
extlang1 = 4 (each subsequent subtag includes '-')
extlang2 = 4 (unlikely: needs prefix="language-extlang1") extlang2 = 4 (unlikely: needs prefix="language-extlang1")
extlang3 = 4 (extremely unlikely) extlang3 = 4 (extremely unlikely)
script = 5 script = 5 (if not suppressed: see Section 4.1)
region = 4 (UN M.49) region = 4 (UN M.49; ISO 3166 requires 3)
variant1 = 9 variant1 = 9 (MUST have language as a prefix)
variant2 = 9 (unlikely: needs prefix="language-variant1") variant2 = 9 (MUST have language-variant1 as a prefix)
private use 1 = 11 ("-x-" + subtag)
private use 2 = 9
total = 62 characters
Figure 2: Derviation of the Longest Tag total = 42 characters
Figure 2: Derivation of the Limit on Tag Length
Applications or protocols which have to truncate a tag MUST do so by
progressively removing subtags along with their preceding "-" from
the right side of the language tag until the tag is short enough for
the given buffer. If the resulting tag ends with a single-character
subtag, that subtag and its preceding "-" MUST also be removed. For
example:
Tag to truncate: zh-Hant-CN-variant1-a-extend1-x-wadegile-private1
1. zh-Hant-CN-variant1-a-extend1-x-wadegile
2. zh-Hant-CN-variant1-a-extend1
3. zh-Hant-CN-variant1
4. zh-Hant-CN
5. zh-Hant
6. zh
Figure 3: Example of Tag Truncation
2.2 Language Subtag Sources and Interpretation 2.2 Language Subtag Sources and Interpretation
The namespace of language tags and their subtags is administered by The namespace of language tags and their subtags is administered by
the Internet Assigned Numbers Authority (IANA) [13] according to the the Internet Assigned Numbers Authority (IANA) [13] according to the
rules in Section 5 of this document. The registry maintained by IANA rules in Section 5 of this document. The registry maintained by IANA
is the source for valid subtags: other standards referenced in this is the source for valid subtags: other standards referenced in this
section provide the source material for that registry. section provide the source material for that registry.
Terminology in this section: Terminology in this section:
skipping to change at page 9, line 48 skipping to change at page 10, line 43
and MAY be used to form the primary language subtag. At the time and MAY be used to form the primary language subtag. At the time
this document was created, there were no examples of this kind of this document was created, there were no examples of this kind of
subtag and future registrations of this type will be discouraged: subtag and future registrations of this type will be discouraged:
primary languages are strongly RECOMMENDED for registration with primary languages are strongly RECOMMENDED for registration with
ISO 639 and proposals rejected by ISO 639/RA will be closely ISO 639 and proposals rejected by ISO 639/RA will be closely
scrutinized before they are registered with IANA. scrutinized before they are registered with IANA.
6. The single character subtag 'x' as the primary subtag indicates 6. The single character subtag 'x' as the primary subtag indicates
that the language tag consists solely of subtags whose meaning is that the language tag consists solely of subtags whose meaning is
defined by private agreement. For example, in the tag "x-fr-CH", defined by private agreement. For example, in the tag "x-fr-CH",
the subtags 'fr' and 'CH' should not be taken to represent the the subtags 'fr' and 'CH' SHOULD NOT be taken to represent the
French language or the country of Switzerland (or any other value French language or the country of Switzerland (or any other value
in the IANA registry) unless there is a private agreement in in the IANA registry) unless there is a private agreement in
place to do so. See Section 4.4. place to do so. See Section 4.4.
7. The single character subtag 'i' is used by some grandfathered 7. The single character subtag 'i' is used by some grandfathered
tags (see Section 2.2.8) such as "i-klingon" and "i-bnn". (Other tags (see Section 2.2.8) such as "i-klingon" and "i-bnn". (Other
grandfathered tags have a primary language subtag in their first grandfathered tags have a primary language subtag in their first
position) position)
8. Other values MUST NOT be assigned to the primary subtag except by 8. Other values MUST NOT be assigned to the primary subtag except by
revision or update of this document. revision or update of this document.
Note: For languages that have both an ISO 639-1 two character code Note: For languages that have both an ISO 639-1 two character code
and an ISO 639-2 three character code, only the ISO 639-1 two and an ISO 639-2 three character code, only the ISO 639-1 two
character code is defined in the IANA registry. character code is defined in the IANA registry.
Note: For languages that have no ISO 639-1 two character code and for Note: For languages that have no ISO 639-1 two character code and for
which the ISO 639-2/T (Terminology) code and the ISO 639-2/B which the ISO 639-2/T (Terminology) code and the ISO 639-2/B
(Bibliographic) codes differ, only the Terminology code is defined in (Bibliographic) codes differ, only the Terminology code is defined in
skipping to change at page 10, line 29 skipping to change at page 11, line 23
(Bibliographic) codes differ, only the Terminology code is defined in (Bibliographic) codes differ, only the Terminology code is defined in
the IANA registry. At the time this document was created, all the IANA registry. At the time this document was created, all
languages that had both kinds of three character code were also languages that had both kinds of three character code were also
assigned a two character code; it is not expected that future assigned a two character code; it is not expected that future
assignments of this nature will occur. assignments of this nature will occur.
Note: To avoid problems with versioning and subtag choice as Note: To avoid problems with versioning and subtag choice as
experienced during the transition between RFC 1766 and RFC 3066, as experienced during the transition between RFC 1766 and RFC 3066, as
well as the canonical nature of subtags defined by this document, the well as the canonical nature of subtags defined by this document, the
ISO 639 Registration Authority Joint Advisory Committee (ISO 639/ ISO 639 Registration Authority Joint Advisory Committee (ISO 639/
RA-JAC) has included the following statement in [16]: RA-JAC) has included the following statement in [17]:
"A language code already in ISO 639-2 at the point of freezing ISO "A language code already in ISO 639-2 at the point of freezing ISO
639-1 shall not later be added to ISO 639-1. This is to ensure 639-1 shall not later be added to ISO 639-1. This is to ensure
consistency in usage over time, since users are directed in Internet consistency in usage over time, since users are directed in Internet
applications to employ the alpha-3 code when an alpha-2 code for that applications to employ the alpha-3 code when an alpha-2 code for that
language is not available." language is not available."
In order to avoid instability of the canonical form of tags, if a two In order to avoid instability of the canonical form of tags, if a two
character code is added to ISO 639-1 for a language for which a three character code is added to ISO 639-1 for a language for which a three
character code was already included in ISO 639-2, the two character character code was already included in ISO 639-2, the two character
skipping to change at page 12, line 7 skipping to change at page 12, line 52
subtag and all extended language subtags and MUST occur before subtag and all extended language subtags and MUST occur before
any other type of subtag described below. any other type of subtag described below.
3. The script subtags 'Qaaa' through 'Qabx' are reserved for private 3. The script subtags 'Qaaa' through 'Qabx' are reserved for private
use in language tags. These subtags correspond to codes reserved use in language tags. These subtags correspond to codes reserved
by ISO 15924 for private use. These codes MAY be used for non- by ISO 15924 for private use. These codes MAY be used for non-
registered script values. Please refer to Section 4.4 for more registered script values. Please refer to Section 4.4 for more
information on private-use subtags. information on private-use subtags.
4. Script subtags cannot be registered using the process in 4. Script subtags cannot be registered using the process in
Section 3.4 of this document. Variant subtags may be considered Section 3.4 of this document. Variant subtags MAY be considered
for registration for that purpose. for registration for that purpose.
Example: "de-Latn" represents German written using the Latin script. Example: "de-Latn" represents German written using the Latin script.
2.2.4 Region Subtag 2.2.4 Region Subtag
The following rules apply to the region subtags: The following rules apply to the region subtags:
1. The region subtag defines language variations used in a specific 1. The region subtag defines language variations used in a specific
region, geographic, or political area. Region subtags MUST region, geographic, or political area. Region subtags MUST
skipping to change at page 13, line 5 skipping to change at page 14, line 4
C. UN numeric codes for countries with ambiguous ISO 3166 C. UN numeric codes for countries with ambiguous ISO 3166
alpha-2 codes as defined in Section 3.3 are defined in the alpha-2 codes as defined in Section 3.3 are defined in the
registry and are canonical for the given country or region registry and are canonical for the given country or region
defined. defined.
D. The alphanumeric codes in Appendix X of the UN document are D. The alphanumeric codes in Appendix X of the UN document are
_not_ defined and MUST NOT be used to form language tags. _not_ defined and MUST NOT be used to form language tags.
(At the time this document was created these values match the (At the time this document was created these values match the
ISO 3166 alpha-2 codes.) ISO 3166 alpha-2 codes.)
4. There MUST be at most one region subtag in a language tag.
4. There may be at most one region subtag in a language tag.
5. The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are 5. The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are
reserved for private use in language tags. These subtags reserved for private use in language tags. These subtags
correspond to codes reserved by ISO 3166 for private use. These correspond to codes reserved by ISO 3166 for private use. These
codes MAY be used for private use region subtags (instead of codes MAY be used for private use region subtags (instead of
using a private-use subtag sequence). Please refer to using a private-use subtag sequence). Please refer to
Section 4.4 for more information on private use subtags. Section 4.4 for more information on private use subtags.
"de-CH" represents German ('de') as used in Switzerland ('CH'). "de-CH" represents German ('de') as used in Switzerland ('CH').
skipping to change at page 13, line 39 skipping to change at page 14, line 37
registration process defined in Section 3.4. registration process defined in Section 3.4.
2. Variant subtags MUST follow all of the other defined subtags, but 2. Variant subtags MUST follow all of the other defined subtags, but
precede any extension or private-use subtag sequences. precede any extension or private-use subtag sequences.
3. More than one variant MAY be used to form the language tag. 3. More than one variant MAY be used to form the language tag.
4. Variant subtags MUST be registered with IANA according to the 4. Variant subtags MUST be registered with IANA according to the
rules in Section 3.4 of this document before being used to form rules in Section 3.4 of this document before being used to form
language tags. In order to distinguish variants from other types language tags. In order to distinguish variants from other types
of subtags, registrations must meet the following length and of subtags, registrations MUST meet the following length and
content restrictions: content restrictions:
1. Variant subtags that begin with a letter (a-z, A-Z) MUST be 1. Variant subtags that begin with a letter (a-z, A-Z) MUST be
at least five characters long. at least five characters long.
2. Variant subtags that begin with a digit (0-9) MUST be at 2. Variant subtags that begin with a digit (0-9) MUST be at
least four characters long. least four characters long.
Variant subtag records in the language subtag registry may include Variant subtag records in the language subtag registry MAY include
one or more 'Prefix' fields, which indicates the language tag or tags one or more 'Prefix' fields, which indicates the language tag or tags
that would make a suitable prefix (with other subtags, as that would make a suitable prefix (with other subtags, as
appropriate) in forming a language tag with the variant. For appropriate) in forming a language tag with the variant. For
example, the subtag 'scouse' has a Prefix of "en", making it suitable example, the subtag 'scouse' has a Prefix of "en", making it suitable
to form language tags such as "en-scouse" and "en-GB-scouse", but not to form language tags such as "en-scouse" and "en-GB-scouse", but not
suitable for use in a tag such as "zh-scouse" or "it-GB-scouse". suitable for use in a tag such as "zh-scouse" or "it-GB-scouse".
"en-scouse" represents the Scouse dialect of English. "en-scouse" represents the Scouse dialect of English.
"de-CH-1996" represents German as used in Switzerland and as written "de-CH-1996" represents German as used in Switzerland and as written
using the spelling reform beginning in the year 1996 C.E. using the spelling reform beginning in the year 1996 C.E.
Most variants that share a prefix are mutually exclusive. For Most variants that share a prefix are mutually exclusive. For
example, the German orthographic variantions '1996' and '1901' should example, the German orthographic variantions '1996' and '1901' SHOULD
not be used in the same tag, as they represent the dates of different NOT be used in the same tag, as they represent the dates of different
spelling reforms. A variant that may be used in combination with spelling reforms. A variant that can meaningfully be used in
another variant should include a 'Prefix' field in its registry combination with another variant SHOULD include a 'Prefix' field in
record that lists that other variant. For example, if another German its registry record that lists that other variant. For example, if
variant 'example' were created that made sense to use with '1996', another German variant 'example' were created that made sense to use
then 'example' should include two Prefix fields: "de" and "de-1996". with '1996', then 'example' should include two Prefix fields: "de"
and "de-1996".
2.2.6 Extension Subtags 2.2.6 Extension Subtags
The following rules apply to extensions: The following rules apply to extensions:
1. Extension subtags are separated from the other subtags defined 1. Extension subtags are separated from the other subtags defined
in this document by a single-letter subtag ("singleton"). The in this document by a single-letter subtag ("singleton"). The
singleton MUST be one allocated to a registration authority via singleton MUST be one allocated to a registration authority via
the mechanism described in Section 3.6 and cannot be the letter the mechanism described in Section 3.6 and cannot be the letter
'x', which is reserved for private-use subtag sequences. 'x', which is reserved for private-use subtag sequences.
skipping to change at page 16, line 23 skipping to change at page 17, line 20
maintain their validity. IANA will maintain these tags in the maintain their validity. IANA will maintain these tags in the
registry under either the "grandfathered" or "redundant" type. For registry under either the "grandfathered" or "redundant" type. For
more information see Section 3.7. more information see Section 3.7.
It is important to note that all language tags formed under the It is important to note that all language tags formed under the
guidelines in this document were either legal, well-formed tags or guidelines in this document were either legal, well-formed tags or
could have been registered under RFC 3066. could have been registered under RFC 3066.
2.2.9 Classes of Conformance 2.2.9 Classes of Conformance
Implementations may wish to express their level of conformance with Implementations sometimes need to describe their capabilities with
the rules and practices described in this document. There are regard to the rules and practices described in this document. There
generally two classes of conforming implementations: "well-formed" are two classes of conforming implementations described by this
processors and "validating" processors. Claims of conformance SHOULD document: "well-formed" processors and "validating" processors.
explicitly reference one of these definitions. Claims of conformance SHOULD explicitly reference one of these
definitions.
An implementation that claims to check for well-formed language tags An implementation that claims to check for well-formed language tags
MUST: MUST:
o Check that the tag and all of its subtags, including extension and o Check that the tag and all of its subtags, including extension and
private-use subtags, conform to the ABNF or that the tag is on the private-use subtags, conform to the ABNF or that the tag is on the
list of grandfathered tags. list of grandfathered tags.
o Check that singleton subtags that identify extensions do not o Check that singleton subtags that identify extensions do not
repeat. For example, the tag "en-a-xx-b-yy-a-zz" is not well- repeat. For example, the tag "en-a-xx-b-yy-a-zz" is not well-
skipping to change at page 18, line 32 skipping to change at page 19, line 32
3.1 Format of the IANA Language Subtag Registry 3.1 Format of the IANA Language Subtag Registry
The IANA Language Subtag Registry ("the registry") will consist of a The IANA Language Subtag Registry ("the registry") will consist of a
text file that is machine readable in the format described in this text file that is machine readable in the format described in this
section, plus copies of the registration forms approved by the section, plus copies of the registration forms approved by the
Language Subtag Reviewer in accordance with the process described in Language Subtag Reviewer in accordance with the process described in
Section 3.4. With the exception of the registration forms for Section 3.4. With the exception of the registration forms for
grandfathered and redundant tags, no registration records will be grandfathered and redundant tags, no registration records will be
maintained for the initial set of subtags. maintained for the initial set of subtags.
The registry will be in a modified record-jar format text file [17]. The registry will be in a modified record-jar format text file [18].
Lines are limited to 72 characters, including all whitespace. Lines are limited to 72 characters, including all whitespace.
Records are separated by lines containing only the sequence "%%" Records are separated by lines containing only the sequence "%%"
(%x25.25). (%x25.25).
Each field can be viewed as a single, logical line of ASCII Each field can be viewed as a single, logical line of ASCII
characters, comprising a field-name and a field-body separated by a characters, comprising a field-name and a field-body separated by a
COLON character (%x3A). For convenience, the field-body portion of COLON character (%x3A). For convenience, the field-body portion of
this conceptual entity can be split into a multiple-line this conceptual entity can be split into a multiple-line
representation; this is called "folding". The format of the registry representation; this is called "folding". The format of the registry
skipping to change at page 19, line 11 skipping to change at page 20, line 11
UNICHAR = "&#x" 2*6HEXDIG ";" UNICHAR = "&#x" 2*6HEXDIG ";"
The sequence '..' (%x2E.2E) in a field-body denotes a range of The sequence '..' (%x2E.2E) in a field-body denotes a range of
values. Such a range represents all subtags of the same length that values. Such a range represents all subtags of the same length that
are alphabetically within that range, including the values explicitly are alphabetically within that range, including the values explicitly
mentioned. For example 'a..c' denotes the values 'a', 'b', and 'c'. mentioned. For example 'a..c' denotes the values 'a', 'b', and 'c'.
Characters from outside the US-ASCII repertoire, as well as the Characters from outside the US-ASCII repertoire, as well as the
AMPERSAND character ("&", %x26) when it occurs in a field-body are AMPERSAND character ("&", %x26) when it occurs in a field-body are
represented by a "Numeric Character Reference" using hexadecimal represented by a "Numeric Character Reference" using hexadecimal
notation in the style used by XML 1.0 [18] (see notation in the style used by XML 1.0 [19] (see
<http://www.w3.org/TR/REC-xml/#dt-charref>). This consists of the <http://www.w3.org/TR/REC-xml/#dt-charref>). This consists of the
sequence "&#x" (%x26.23.78) followed by a hexadecimal representation sequence "&#x" (%x26.23.78) followed by a hexadecimal representation
of the character's code point in ISO/IEC 10646 [6] followed by a of the character's code point in ISO/IEC 10646 [6] followed by a
closing semicolon (%x3B). For example, the EURO SIGN, U+20AC, would closing semicolon (%x3B). For example, the EURO SIGN, U+20AC, would
be represented by the sequence "&#x20AC;". Note that the hexadecimal be represented by the sequence "&#x20AC;". Note that the hexadecimal
notation may have between two and six digits. notation MAY have between two and six digits.
All fields whose field-body contains a date value use the "full-date" All fields whose field-body contains a date value use the "full-date"
format specified in RFC 3339 [14]. For example: "2004-06-28" format specified in RFC 3339 [15]. For example: "2004-06-28"
represents June 28, 2004 in the Gregorian calendar. represents June 28, 2004 in the Gregorian calendar.
The first record in the file contains the single field whose field- The first record in the file contains the single field whose field-
name is "File-Date". The field-body of this record contains the last name is "File-Date". The field-body of this record contains the last
modification date of this copy of the registry, making it possible to modification date of this copy of the registry, making it possible to
compare different versions of the registry. The registry on the IANA compare different versions of the registry. The registry on the IANA
website is the most current. Versions with an older date than that website is the most current. Versions with an older date than that
one are not up-to-date. one are not up-to-date.
File-Date: 2004-06-28 File-Date: 2004-06-28
skipping to change at page 20, line 27 skipping to change at page 21, line 27
the registry. the registry.
The 'Subtag' or 'Tag' field MUST use lowercase letters to form the The 'Subtag' or 'Tag' field MUST use lowercase letters to form the
subtag or tag, with two exceptions. Subtags whose 'Type' field is subtag or tag, with two exceptions. Subtags whose 'Type' field is
'script' (in other words, subtags defined by ISO 15924) MUST use 'script' (in other words, subtags defined by ISO 15924) MUST use
titlecase. Subtags whose 'Type' field is 'region' (in other words, titlecase. Subtags whose 'Type' field is 'region' (in other words,
subtags defined by ISO 3166) MUST use uppercase. These exceptions subtags defined by ISO 3166) MUST use uppercase. These exceptions
mirror the use of case in the underlying standards. mirror the use of case in the underlying standards.
The field 'Description' MAY appear more than one time. At least one The field 'Description' MAY appear more than one time. At least one
of the 'Description' fields must contain a description of the tag of the 'Description' fields MUST contain a description of the tag
being registered written or transcribed into the Latin script; the being registered written or transcribed into the Latin script; the
same or additional fields may also include a description in a non- same or additional fields MAY also include a description in a non-
Latin script. The 'Description' field is used for identification Latin script. The 'Description' field is used for identification
purposes and should not be taken to represent the actual native name purposes and SHOULD NOT be taken to represent the actual native name
of the language or variation or to be in any particular language. of the language or variation or to be in any particular language.
Most descriptions are taken directly from source standards such as Most descriptions are taken directly from source standards such as
ISO 639 or ISO 3166. ISO 639 or ISO 3166.
Note: Descriptions in registry entries that correspond to ISO 639, Note: Descriptions in registry entries that correspond to ISO 639,
ISO 15924, ISO 3166 or UN M.49 codes are intended only to indicate ISO 15924, ISO 3166 or UN M.49 codes are intended only to indicate
the meaning of that identifier as defined in the source standard at the meaning of that identifier as defined in the source standard at
the time it was added to the registry. The description does not the time it was added to the registry. The description does not
replace the content of the source standard itself. The descriptions replace the content of the source standard itself. The descriptions
are not intended to be the English localized names for the subtags. are not intended to be the English localized names for the subtags.
skipping to change at page 21, line 16 skipping to change at page 22, line 16
mapping to a complete language tag. mapping to a complete language tag.
o Deprecated o Deprecated
* Deprecated's field-value contains the date the record was * Deprecated's field-value contains the date the record was
deprecated. deprecated.
o Prefix o Prefix
* Prefix's field-value contains a language tag with which this * Prefix's field-value contains a language tag with which this
subtag may be used to form a new language tag, perhaps with subtag MAY be used to form a new language tag, perhaps with
other subtags as well. This field MUST only appear in records other subtags as well. This field MUST only appear in records
whose 'Type' field-value is 'variant' or 'extlang'. For whose 'Type' field-value is 'variant' or 'extlang'. For
example, the 'Prefix' for the variant 'scouse' is 'en', meaning example, the 'Prefix' for the variant 'scouse' is 'en', meaning
that the tags "en-scouse" and "en-GB-scouse" might be that the tags "en-scouse" and "en-GB-scouse" might be
appropriate while the tag "is-scouse" is not. appropriate while the tag "is-scouse" is not.
o Comments o Comments
* Comments contains additional information about the subtag, as * Comments contains additional information about the subtag, as
deemed appropriate for understanding the registry and deemed appropriate for understanding the registry and
skipping to change at page 21, line 40 skipping to change at page 22, line 40
* Suppress-Script contains a script subtag that SHOULD NOT be * Suppress-Script contains a script subtag that SHOULD NOT be
used to form language tags with the associated primary language used to form language tags with the associated primary language
subtag. This field MUST only appear in records whose 'Type' subtag. This field MUST only appear in records whose 'Type'
field-value is 'language'. See Section 4.1. field-value is 'language'. See Section 4.1.
The field 'Deprecated' MAY be added to any record via the maintenance The field 'Deprecated' MAY be added to any record via the maintenance
process described in Section 3.2 or via the registration process process described in Section 3.2 or via the registration process
described in Section 3.4. Usually the addition of a 'Deprecated' described in Section 3.4. Usually the addition of a 'Deprecated'
field is due to the action of one of the standards bodies, such as field is due to the action of one of the standards bodies, such as
ISO 3166, withdrawing a code. In some historical cases it may not ISO 3166, withdrawing a code. In some historical cases it might not
have been possible to reconstruct the original deprecation date. have been possible to reconstruct the original deprecation date.
For these cases, an approximate date appears in the registry. For these cases, an approximate date appears in the registry.
Although valid in language tags, subtags and tags with a 'Deprecated' Although valid in language tags, subtags and tags with a 'Deprecated'
field are deprecated and validating processors SHOULD NOT generate field are deprecated and validating processors SHOULD NOT generate
these subtags. Note that a record that contains a 'Deprecated' field these subtags. Note that a record that contains a 'Deprecated' field
and no corresponding 'Preferred-Value' field has no replacement and no corresponding 'Preferred-Value' field has no replacement
mapping. mapping.
Thie field 'Preferred-Value' contains a mapping between the record in Thie field 'Preferred-Value' contains a mapping between the record in
which it appears and a tag or subtag which should be preferred when which it appears and a tag or subtag which SHOULD be preferred when
selected language tags. These values form three groups: selected language tags. These values form three groups:
ISO 639 language codes which were later withdrawn in favor of ISO 639 language codes which were later withdrawn in favor of
other codes. These values are mostly a historical curiosity. other codes. These values are mostly a historical curiosity.
ISO 3166 region codes which have been withdrawn in favor of a new ISO 3166 region codes which have been withdrawn in favor of a new
code. This sometimes happens when a country changes its name or code. This sometimes happens when a country changes its name or
administration in such a way that warrents a new region code. administration in such a way that warrents a new region code.
Tags grandfathered from RFC 3066. In many cases these tags have Tags grandfathered from RFC 3066. In many cases these tags have
become obsolete because the values they represent were later become obsolete because the values they represent were later
encoded by ISO 639. encoded by ISO 639.
Records that contain a 'Preferred-Value' field MUST also have a Records that contain a 'Preferred-Value' field MUST also have a
'Deprecated' field. This field contains a date of deprecation. Thus 'Deprecated' field. This field contains a date of deprecation. Thus
a language tag processor can use the registry to construct the valid, a language tag processor can use the registry to construct the valid,
non-deprecated set of subtags for a given date. In addition, for any non-deprecated set of subtags for a given date. In addition, for any
given tag, a processor can construct the set of valid language tags given tag, a processor can construct the set of valid language tags
that correspond to that tag for all dates up to the date of the that correspond to that tag for all dates up to the date of the
registry. The ability to do these mappings may be beneficial to registry. The ability to do these mappings MAY be beneficial to
applications that are matching, selecting, for filtering content applications that are matching, selecting, for filtering content
based on its language tags. based on its language tags.
It should be noted that 'Preferred-Value' mappings in records of type Note that 'Preferred-Value' mappings in records of type 'region' MAY
'region' may not represent exactly the same meaning as the original NOT represent exactly the same meaning as the original value. There
value. There are many reasons that a country code may be changed and are many reasons for a country code to be changed and the effect this
the effect this has on the formation of language tags may depend on has on the formation of language tags will depend on the nature of
the nature of the change in question. the change in question.
In particular, the 'Preferred-Value' field does not imply that In particular, the 'Preferred-Value' field does not imply retagging
content formerly tagged with one tag should be retagged. content that uses the affected subtag.
The field 'Preferred-Value' MUST NOT be modified once created in the The field 'Preferred-Value' MUST NOT be modified once created in the
registry. The field MAY be added to records of type "grandfathered" registry. The field MAY be added to records of type "grandfathered"
and "region" according to the rules in Section 3.2. Otherwise the and "region" according to the rules in Section 3.2. Otherwise the
field MUST NOT be added to any record already in the registry. field MUST NOT be added to any record already in the registry.
The 'Preferred-Value' field in records of type "grandfathered" and The 'Preferred-Value' field in records of type "grandfathered" and
"redundant" contains whole language tags that are strongly "redundant" contains whole language tags that are strongly
RECOMMENDED for use in place of the record's value. In many cases RECOMMENDED for use in place of the record's value. In many cases
the mappings were created by deprecation of the tags during the the mappings were created by deprecation of the tags during the
skipping to change at page 23, line 7 skipping to change at page 24, line 7
'nn'. 'nn'.
Records of type 'variant' MAY have more than one field of type Records of type 'variant' MAY have more than one field of type
'Prefix'. Additional fields of this type MAY be added to a 'variant' 'Prefix'. Additional fields of this type MAY be added to a 'variant'
record via the registration process. record via the registration process.
Records of type 'extlang' MUST have _exactly_ one 'Prefix' field. Records of type 'extlang' MUST have _exactly_ one 'Prefix' field.
The field-value of the 'Prefix' field consists of a language tag The field-value of the 'Prefix' field consists of a language tag
whose subtags are appropriate to use with this subtag. For example, whose subtags are appropriate to use with this subtag. For example,
the variant subtag 'scouse' has a recommended prefix of "en". This the variant subtag 'scouse' has a Prefix field of "en". This means
means that tags starting with the prefix "en-" are most appropriate that tags starting with the sequence "en-" are most appropriate with
with this subtag, so "en-Latn-scouse" and "en-GB-scouse" are both this subtag, so "en-Latn-scouse" and "en-GB-scouse" are both
acceptable, while the tag "fr-scouse" is an inappropriate choice. acceptable, while the tag "fr-scouse" is an inappropriate choice.
The field of type 'Prefix' MUST NOT be removed from any record. The The field of type 'Prefix' MUST NOT be removed from any record. The
field-value for this type of field MUST NOT be modified. field-value for this type of field MUST NOT be modified.
The field 'Comments' MAY appear more than once per record. This The field 'Comments' MAY appear more than once per record. This
field MAY be inserted or changed via the registration process and no field MAY be inserted or changed via the registration process and no
guarantee of stability is provided. The content of this field is not guarantee of stability is provided. The content of this field is not
restricted, except by the need to register the information, the restricted, except by the need to register the information, the
suitability of the request, and by reasonable practical size suitability of the request, and by reasonable practical size
limitations. Long screeds about a particular subtag are frowned limitations. Long screeds about a particular subtag are frowned
upon. upon.
The field 'Suppress-Script' MUST only appear in records whose 'Type' The field 'Suppress-Script' MUST only appear in records whose 'Type'
field-value is 'language'. This field may appear at most one time in field-value is 'language'. This field MAY appear at most one time in
a record. This field indicates a script used to write the a record. This field indicates a script used to write the
overwhelming majority of documents for the given language and which overwhelming majority of documents for the given language and which
therefore adds no distinguishing information to a language tag. It therefore adds no distinguishing information to a language tag. It
helps ensure greater compatibility between the language tags helps ensure greater compatibility between the language tags
generated according to the rules in this document and language tags generated according to the rules in this document and language tags
and tag processors or consumers based on RFC 3066. For example, and tag processors or consumers based on RFC 3066. For example,
virtually all Icelandic documents are written in the Latin script, virtually all Icelandic documents are written in the Latin script,
making the subtag 'Latn' redundant in the tag "is-Latn". making the subtag 'Latn' redundant in the tag "is-Latn".
For examples of registry entries and their format, see Appendix C. For examples of registry entries and their format, see Appendix C.
3.2 Maintenance of the Registry 3.2 Maintenance of the Registry
Maintenance of the registry requires that as codes are assigned or Maintenance of the registry requires that as codes are assigned or
withdrawn by ISO 639, ISO 15924, and ISO 3166, the Language Subtag withdrawn by ISO 639, ISO 15924, and ISO 3166, the Language Subtag
Reviewer will evaluate each change, determine whether it conflicts Reviewer will evaluate each change, determine whether it conflicts
with existing registry entries, and submit the information to IANA with existing registry entries, and submit the information to IANA
for inclusion in the registry. If an change takes place and the for inclusion in the registry. If an change takes place and the
Language Subtag Reviewer does not do this in a timely manner, then Language Subtag Reviewer does not do this in a timely manner, then
any interested party may use the procedure in Section 3.4 to register any interested party MAY use the procedure in Section 3.4 to register
the appropriate update. the appropriate update.
Note: The redundant and grandfathered entries together are the Note: The redundant and grandfathered entries together are the
complete list of tags registered under RFC 3066 [23]. The redundant complete list of tags registered under RFC 3066 [24]. The redundant
tags are those that can now be formed using the subtags defined in tags are those that can now be formed using the subtags defined in
the registry together with the rules of Section 2.2. The the registry together with the rules of Section 2.2. The
grandfathered entries are those that can never be legal under those grandfathered entries are those that can never be legal under those
same provisions. same provisions.
The set of redundant and grandfathered tags is permanent and stable: The set of redundant and grandfathered tags is permanent and stable:
no new entries will be added and none of the entries will be removed. no new entries will be added and none of the entries will be removed.
Records of type 'grandfathered' may have their type converted to Records of type 'grandfathered' MAY have their type converted to
'redundant': see Section 3.7 for more information. 'redundant': see Section 3.7 for more information.
RFC 3066 tags that were deprecated prior to the adoption of this RFC 3066 tags that were deprecated prior to the adoption of this
document are part of the list of grandfathered tags and their document are part of the list of grandfathered tags and their
component subtags were not included as registered variants (although component subtags were not included as registered variants (although
they remain eligible for registration). For example, the tag "art- they remain eligible for registration). For example, the tag "art-
lojban" was deprecated in favor of the language subtag 'jbo'. lojban" was deprecated in favor of the language subtag 'jbo'.
The Language Subtag Reviewer MUST ensure that new subtags meet the The Language Subtag Reviewer MUST ensure that new subtags meet the
requirements in Section 4.1 or submit an appropriate alternate subtag requirements in Section 4.1 or submit an appropriate alternate subtag
as described in that section. If a change or addition to the as described in that section. If a change or addition to the
registry is required, the Language Subtag Reviewer will prepare the registry is needed, the Language Subtag Reviewer will prepare the
complete record, including all fields, and forward it to IANA for complete record, including all fields, and forward it to IANA for
insertion into the registry. If this represents a new subtag, then insertion into the registry. If this represents a new subtag, then
the message will indicate that this represents an INSERTION of a the message will indicate that this represents an INSERTION of a
record. If this represents a change to an existing subtag, then the record. If this represents a change to an existing subtag, then the
message must indicate that this represents a MODIFICATION, as shown message MUST indicate that this represents a MODIFICATION, as shown
in the following example: in the following example:
LANGUAGE SUBTAG MODIFICATION LANGUAGE SUBTAG MODIFICATION
File-Date: 2005-01-02 File-Date: 2005-01-02
%% %%
Type: variant Type: variant
Subtag: nedis Subtag: nedis
Description: Natisone dialect Description: Natisone dialect
Description: Nadiza dialect Description: Nadiza dialect
Added: 2003-10-09 Added: 2003-10-09
Prefix: sl Prefix: sl
Comments: This is a comment shown Comments: This is a comment shown
as an example. as an example.
%% %%
Figure 5 Figure 6
Whenever an entry is created or modified in the registry, the 'File- Whenever an entry is created or modified in the registry, the 'File-
Date' record at the start of the registry is updated to reflect the Date' record at the start of the registry is updated to reflect the
most recent modification date in the RFC 3339 [14] "full-date" most recent modification date in the RFC 3339 [15] "full-date"
format. format.
Values in the 'Subtag' field must be lowercase except as provided for Values in the 'Subtag' field MUST be lowercase except as provided for
in Section 3.1. in Section 3.1.
3.3 Stability of IANA Registry Entries 3.3 Stability of IANA Registry Entries
The stability of entries and their meaning in the registry is The stability of entries and their meaning in the registry is
critical to the long term stability of language tags. The rules in critical to the long term stability of language tags. The rules in
this section guarantee that a specific language tag's meaning is this section guarantee that a specific language tag's meaning is
stable over time and will not change. stable over time and will not change.
These rules specifically deal with how changes to codes (including These rules specifically deal with how changes to codes (including
withdrawal and deprecation of codes) maintained by ISO 639, ISO withdrawal and deprecation of codes) maintained by ISO 639, ISO
15924, ISO 3166, and UN M.49 are reflected in the IANA Language 15924, ISO 3166, and UN M.49 are reflected in the IANA Language
Subtag Registry. Assignments to the IANA Language Subtag Registry Subtag Registry. Assignments to the IANA Language Subtag Registry
MUST follow the following stability rules: MUST follow the following stability rules:
o Values in the fields 'Type', 'Subtag', 'Tag', 'Added', o Values in the fields 'Type', 'Subtag', 'Tag', 'Added',
'Deprecated' and 'Preferred-Value' MUST NOT be changed and are 'Deprecated' and 'Preferred-Value' MUST NOT be changed and are
guaranteed to be stable over time. guaranteed to be stable over time.
o Values in the 'Description' field MUST NOT be changed in a way o Values in the 'Description' field MUST NOT be changed in a way
that would invalidate previously-existing tags. They may be that would invalidate previously-existing tags. They MAY be
broadened somewhat in scope, changed to add information, or broadened somewhat in scope, changed to add information, or
adapted to the most common modern usage. For example, countries adapted to the most common modern usage. For example, countries
occasionally change their official names: an historical example of occasionally change their official names: an historical example of
this would be "Upper Volta" changing to "Burkina Faso". this would be "Upper Volta" changing to "Burkina Faso".
o Values in the field 'Prefix' MAY be added to records of type o Values in the field 'Prefix' MAY be added to records of type
'variant' via the registration process. 'variant' via the registration process.
o Values in the field 'Prefix' MAY be modified, so long as the o Values in the field 'Prefix' MAY be modified, so long as the
modifications broaden the set of recommended prefixes. That is, a modifications broaden the set of prefixes. That is, a prefix MAY
recommended prefix MAY be replaced by one of its own prefixes. be replaced by one of its own prefixes. For example, the prefix
For example, the prefix "en-US" could be replaced by "en", but not "en-US" could be replaced by "en", but not by the prefixes "en-
by the ranges "en-Latn", "fr", or "en-US-boont". Latn", "fr", or "en-US-boont". If one of those prefixes were
needed, a new Prefix SHOULD be registered.
o Values in the field 'Prefix' MUST NOT be removed. o Values in the field 'Prefix' MUST NOT be removed.
o The field 'Comments' MAY be added, changed, modified, or removed o The field 'Comments' MAY be added, changed, modified, or removed
via the registration process or any of the processes or via the registration process or any of the processes or
considerations described in this section. considerations described in this section.
o The field 'Suppress-Script' MAY be added or removed via the o The field 'Suppress-Script' MAY be added or removed via the
registration process. registration process.
skipping to change at page 26, line 6 skipping to change at page 27, line 7
conflict with existing subtags of the associated type and whose conflict with existing subtags of the associated type and whose
meaning is not the same as an existing subtag of the same type are meaning is not the same as an existing subtag of the same type are
entered into the IANA registry as new records and their value is entered into the IANA registry as new records and their value is
canonical for the meaning assigned to them. canonical for the meaning assigned to them.
o Codes assigned by ISO 639, ISO 15924, or ISO 3166 that are o Codes assigned by ISO 639, ISO 15924, or ISO 3166 that are
withdrawn by their respective maintenance or registration withdrawn by their respective maintenance or registration
authority remain valid in language tags. A 'Deprecated' field authority remain valid in language tags. A 'Deprecated' field
containing the date of withdrawl is added to the record. If a new containing the date of withdrawl is added to the record. If a new
record of the same type is added that represents a replacement record of the same type is added that represents a replacement
value, then a 'Preferred-Value' field may also be added. The value, then a 'Preferred-Value' field MAY also be added. The
registration process MAY be used to add comments about the registration process MAY be used to add comments about the
withdrawal of the code by the respective standard. withdrawal of the code by the respective standard.
* The region code 'TL' was assigned to the country 'Timor-Leste', * The region code 'TL' was assigned to the country 'Timor-Leste',
replacing the code 'TP' (which was assigned to 'East Timor' replacing the code 'TP' (which was assigned to 'East Timor'
when it was under administration by Portugal). The subtag 'TP' when it was under administration by Portugal). The subtag 'TP'
remains valid in language tags, but its record contains the a remains valid in language tags, but its record contains the a
'Preferred-Value' of 'TL' and its field 'Deprecated' contains 'Preferred-Value' of 'TL' and its field 'Deprecated' contains
the date the new code was assigned ('2004-07-06'). the date the new code was assigned ('2004-07-06').
o Codes assigned by ISO 639, ISO 15924, or ISO 3166 that conflict o Codes assigned by ISO 639, ISO 15924, or ISO 3166 that conflict
with existing subtags of the associated type, including subtags with existing subtags of the associated type, including subtags
that are deprecated, MUST NOT be entered into the registry. The that are deprecated, MUST NOT be entered into the registry. The
following additional considerations apply: following additional considerations apply:
* For ISO 639 codes, if the newly assigned code's meaning is not * For ISO 639 codes, if the newly assigned code's meaning is not
represented by a subtag in the IANA registry, the Language represented by a subtag in the IANA registry, the Language
Subtag Reviewer, as described in Section 3.4, shall prepare a Subtag Reviewer, as described in Section 3.4, SHALL prepare a
proposal for entering in the IANA registry as soon as practical proposal for entering in the IANA registry as soon as practical
a registered language subtag as an alternate value for the new a registered language subtag as an alternate value for the new
code. The form of the registered language subtag will be at code. The form of the registered language subtag will be at
the discretion of the Language Subtag Reviewer and must conform the discretion of the Language Subtag Reviewer and MUST conform
to other restrictions on language subtags in this document. to other restrictions on language subtags in this document.
* For all subtags whose meaning is derived from an external * For all subtags whose meaning is derived from an external
standard (i.e. ISO 639, ISO 15924, ISO 3166, or UN M.49), if a standard (i.e. ISO 639, ISO 15924, ISO 3166, or UN M.49), if a
new meaning is assigned to an existing code and the new meaning new meaning is assigned to an existing code and the new meaning
broadens the meaning of that code, then the meaning for the broadens the meaning of that code, then the meaning for the
associated subtag MAY be changed to match. The meaning of a associated subtag MAY be changed to match. The meaning of a
subtag MUST NOT be narrowed, however, as this can result in an subtag MUST NOT be narrowed, however, as this can result in an
unknown proportion of the existing uses of a subtag becoming unknown proportion of the existing uses of a subtag becoming
invalid. Note: ISO 639 MA/RA has adopted a similar stability invalid. Note: ISO 639 MA/RA has adopted a similar stability
policy. policy.
* For ISO 15924 codes, if the newly assigned code's meaning is * For ISO 15924 codes, if the newly assigned code's meaning is
not represented by a subtag in the IANA registry, the Language not represented by a subtag in the IANA registry, the Language
Subtag Reviewer, as described in Section 3.4, shall prepare a Subtag Reviewer, as described in Section 3.4, SHALL prepare a
proposal for entering in the IANA registry as soon as practical proposal for entering in the IANA registry as soon as practical
a registered variant subtag as an alternate value for the new a registered variant subtag as an alternate value for the new
code. The form of the registered variant subtag will be at the code. The form of the registered variant subtag will be at the
discretion of the Language Subtag Reviewer and must conform to discretion of the Language Subtag Reviewer and MUST conform to
other restrictions on variant subtags in this document. other restrictions on variant subtags in this document.
* For ISO 3166 codes, if the newly assigned code's meaning is * For ISO 3166 codes, if the newly assigned code's meaning is
associated with the same UN M.49 code as another 'region' associated with the same UN M.49 code as another 'region'
subtag, then the existing region subtag remains as the subtag, then the existing region subtag remains as the
preferred value for that region and no new entry is created. A preferred value for that region and no new entry is created. A
comment MAY be added to the existing region subtag indicating comment MAY be added to the existing region subtag indicating
the relationship to the new ISO 3166 code. the relationship to the new ISO 3166 code.
* For ISO 3166 codes, if the newly assigned code's meaning is * For ISO 3166 codes, if the newly assigned code's meaning is
associated with a UN M.49 code that is not represented by an associated with a UN M.49 code that is not represented by an
existing region subtag, then then the Language Subtag Reviewer, existing region subtag, then then the Language Subtag Reviewer,
as described in Section 3.4, shall prepare a proposal for as described in Section 3.4, SHALL prepare a proposal for
entering the appropriate numeric UN country code as an entry in entering the appropriate numeric UN country code as an entry in
the IANA registry. the IANA registry.
* For ISO 3166 codes, if there is no associated UN numeric code, * For ISO 3166 codes, if there is no associated UN numeric code,
then the Language Subtag Reviewer SHALL petition the UN to then the Language Subtag Reviewer SHALL petition the UN to
create one. If there is no response from the UN within ninety create one. If there is no response from the UN within ninety
days of the request being sent, the Language Subtag Reviewer days of the request being sent, the Language Subtag Reviewer
shall prepare a proposal for entering in the IANA registry as SHALL prepare a proposal for entering in the IANA registry as
soon as practical a registered variant subtag as an alternate soon as practical a registered variant subtag as an alternate
value for the new code. The form of the registered variant value for the new code. The form of the registered variant
subtag will be at the discretion of the Language Subtag subtag will be at the discretion of the Language Subtag
Reviewer and must conform to other restrictions on variant Reviewer and MUST conform to other restrictions on variant
subtags in this document. This situation is very unlikely to subtags in this document. This situation is very unlikely to
ever occur. ever occur.
o Stability provisions apply to grandfathered tags with this o Stability provisions apply to grandfathered tags with this
exception: should all of the subtags in a grandfathered tag become exception: should all of the subtags in a grandfathered tag become
valid subtags in the IANA registry, then the field 'Type' in that valid subtags in the IANA registry, then the field 'Type' in that
record is changed from 'grandfathered' to 'redundant'. Note that record is changed from 'grandfathered' to 'redundant'. Note that
this will not affect language tags that match the grandfathered this will not affect language tags that match the grandfathered
tag, since these tags will now match valid generative subtag tag, since these tags will now match valid generative subtag
sequences. For example, if the subtag 'gan' in the language tag sequences. For example, if the subtag 'gan' in the language tag
skipping to change at page 27, line 46 skipping to change at page 28, line 49
then the grandfathered tag "zh-gan" would be deprecated (but then the grandfathered tag "zh-gan" would be deprecated (but
existing content or implementations that use "zh-gan" would remain existing content or implementations that use "zh-gan" would remain
valid). valid).
3.4 Registration Procedure for Subtags 3.4 Registration Procedure for Subtags
The procedure given here MUST be used by anyone who wants to use a The procedure given here MUST be used by anyone who wants to use a
subtag not currently in the IANA Language Subtag Registry. subtag not currently in the IANA Language Subtag Registry.
Only subtags of type 'language' and 'variant' will be considered for Only subtags of type 'language' and 'variant' will be considered for
independent registration of new subtags. Handling of subtags independent registration of new subtags. Handling of subtags needed
required for stability and subtags required to keep the registry for stability and subtags necessary to keep the registry synchronized
synchronized with ISO 639, ISO 15924, ISO 3166, and UN M.49 within with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits
the limits defined by this document are described in Section 3.2. defined by this document are described in Section 3.2. Stability
Stability provisions are described in Section 3.3. provisions are described in Section 3.3.
This procedure MAY also be used to register or alter the information This procedure MAY also be used to register or alter the information
for the "Description", "Comments", "Deprecated", or "Prefix" fields for the "Description", "Comments", "Deprecated", or "Prefix" fields
in a subtag's record as described in Figure 8. Changes to all other in a subtag's record as described in Figure 9. Changes to all other
fields in the IANA registry are NOT permitted. fields in the IANA registry are NOT permitted.
Registering a new subtag or requesting modifications to an existing Registering a new subtag or requesting modifications to an existing
tag or subtag starts with the requster filling out the registration tag or subtag starts with the requster filling out the registration
form reproduced below. Note that each response is not limited in form reproduced below. Note that each response is not limited in
size and should take the room necessary to adequately describe the size so that the request can adequately describe the registration.
registration. The fields in the "Record Requested" section SHOULD The fields in the "Record Requested" section SHOULD follow the
follow the requirements in Section 3.1. requirements in Section 3.1.
LANGUAGE SUBTAG REGISTRATION FORM LANGUAGE SUBTAG REGISTRATION FORM
1. Name of requester: 1. Name of requester:
2. E-mail address of requester: 2. E-mail address of requester:
3. Record Requested: 3. Record Requested:
Type: Type:
Subtag: Subtag:
Description: Description:
Prefix: Prefix:
Preferred-Value: Preferred-Value:
Deprecated: Deprecated:
Suppress-Script: Suppress-Script:
Comments: Comments:
4. Intended meaning of the subtag: 4. Intended meaning of the subtag:
5. Reference to published description 5. Reference to published description
of the language (book or article): of the language (book or article):
6. Any other relevant information: 6. Any other relevant information:
Figure 6 Figure 7
The subtag registration form MUST be sent to The subtag registration form MUST be sent to
<ietf-languages@iana.org> for a two week review period before it can <ietf-languages@iana.org> for a two week review period before it can
be submitted to IANA. (This is an open list. Requests to be added be submitted to IANA. (This is an open list and can be joined by
should be sent to <ietf-languages-request@iana.org>.) sending a request to <ietf-languages-request@iana.org>.)
Variant and extlang subtags are always registered for use with a Variant and extlang subtags are always registered for use with a
particular range of language tags. For example, the subtag 'scouse' particular range of language tags. For example, the subtag 'scouse'
is intended for use with language tags that start with the primary is intended for use with language tags that start with the primary
language subtag "en", since Scouse is a dialect of English. Thus the language subtag "en", since Scouse is a dialect of English. Thus the
subtag 'scouse' could be included in tags such as "en-Latn-scouse" or subtag 'scouse' could be included in tags such as "en-Latn-scouse" or
"en-GB-scouse". This information is stored in the "Prefix" field in "en-GB-scouse". This information is stored in the "Prefix" field in
the registry. Variant registration requests are REQUIRED to include the registry. Variant registration requests are REQUIRED to include
at least one "Prefix" field in the registration form. at least one "Prefix" field in the registration form.
The 'Prefix' field for a given registered subtag will be maintained The 'Prefix' field for a given registered subtag will be maintained
in the IANA registry as a guide to usage. Additional prefixes MAY be in the IANA registry as a guide to usage. Additional prefixes MAY be
added by filing an additional registration form. In that form, the added by filing an additional registration form. In that form, the
"Any other relevant information:" field should indicate that it is "Any other relevant information:" field MUST indicate that it is the
the addition of a prefix. addition of a prefix.
Requests to add a prefix to a variant subtag that imply a different Requests to add a prefix to a variant subtag that imply a different
semantic meaning will probably be rejected. For example, a request semantic meaning will probably be rejected. For example, a request
to add the prefix "de" to the subtag 'nedis' so that the tag "de- to add the prefix "de" to the subtag 'nedis' so that the tag "de-
nedis" represented some German dialect would be rejected. The nedis" represented some German dialect would be rejected. The
'nedis' subtag represents a particular Slovenian dialect and the 'nedis' subtag represents a particular Slovenian dialect and the
additional registration would change the semantic meaning assigned to additional registration would change the semantic meaning assigned to
the subtag. A separate subtag should be proposed instead. the subtag. A separate subtag SHOULD be proposed instead.
The 'Description' field must contain a description of the tag being The 'Description' field MUST contain a description of the tag being
registered written or transcribed into the Latin script; it may also registered written or transcribed into the Latin script; it MAY also
include a description in a non-Latin script. Non-ASCII characters include a description in a non-Latin script. Non-ASCII characters
must be escaped using the syntax described in Section 3.1. The MUST be escaped using the syntax described in Section 3.1. The
'Description' field is used for identification purposes and should 'Description' field is used for identification purposes and doesn't
not be taken to represent the actual native name of the language or necessarily represent the actual native name of the language or
variation or to be in any particular language. variation or to be in any particular language.
While the 'Description' field itself is not guaranteed to be stable While the 'Description' field itself is not guaranteed to be stable
and errata corrections may be undertaken from time to time, attempts and errata corrections MAY be undertaken from time to time, attempts
to provide translations or transcriptions of entries in the registry to provide translations or transcriptions of entries in the registry
itself will probably be frowned upon by the community or rejected itself will probably be frowned upon by the community or rejected
outright, as changes of this nature may impact the provisions in outright, as changes of this nature have an impact on the provisions
Section 3.3. in Section 3.3.
The Language Subtag Reviewer is responsible for responding to The Language Subtag Reviewer is responsible for responding to
requests for the registration of subtags through the registration requests for the registration of subtags through the registration
process and is appointed by the IESG. process and is appointed by the IESG.
When the two week period has passed the Language Subtag Reviewer When the two week period has passed the Language Subtag Reviewer
either forwards the record to be inserted or modified to either forwards the record to be inserted or modified to
iana@iana.org according to the procedure described in Section 3.2, or iana@iana.org according to the procedure described in Section 3.2, or
rejects the request because of significant objections raised on the rejects the request because of significant objections raised on the
list or due to problems with constraints in this document (which list or due to problems with constraints in this document (which MUST
should be explicitly cited). The reviewer may also extend the review be explicitly cited). The reviewer MAY also extend the review period
period in two week increments to permit further discussion. The in two week increments to permit further discussion. The reviewer
reviewer must indicate on the list whether the registration has been MUST indicate on the list whether the registration has been accepted,
accepted, rejected, or extended following each two week period. rejected, or extended following each two week period.
Note that the reviewer can raise objections on the list if he or she Note that the reviewer can raise objections on the list if he or she
so desires. The important thing is that the objection must be made so desires. The important thing is that the objection MUST be made
publicly. publicly.
The applicant is free to modify a rejected application with The applicant is free to modify a rejected application with
additional information and submit it again; this restarts the two additional information and submit it again; this restarts the two
week comment period. week comment period.
Decisions made by the reviewer may be appealed to the IESG [RFC 2028] Decisions made by the reviewer MAY be appealed to the IESG [RFC 2028]
[9] under the same rules as other IETF decisions [RFC 2026] [8]. [9] under the same rules as other IETF decisions [RFC 2026] [8].
All approved registration forms are available online in the directory All approved registration forms are available online in the directory
http://www.iana.org/numbers.html under "languages". http://www.iana.org/numbers.html under "languages".
Updates or changes to existing records, including previous Updates or changes to existing records, including previous
registrations, follow the same procedure as new registrations. The registrations, follow the same procedure as new registrations. The
Language Subtag Reviewer decides whether there is consensus to update Language Subtag Reviewer decides whether there is consensus to update
the registration following the two week review period; normally the registration following the two week review period; normally
objections by the original registrant will carry extra weight in objections by the original registrant will carry extra weight in
skipping to change at page 30, line 29 skipping to change at page 31, line 32
Registrations are permanent and stable. Once registered, subtags Registrations are permanent and stable. Once registered, subtags
will not be removed from the registry and will remain a valid way in will not be removed from the registry and will remain a valid way in
which to specify a specific language or variant. which to specify a specific language or variant.
Note: The purpose of the "Description" in the registration form is Note: The purpose of the "Description" in the registration form is
intended as an aid to people trying to verify whether a language is intended as an aid to people trying to verify whether a language is
registered or what language or language variation a particular subtag registered or what language or language variation a particular subtag
refers to. In most cases, reference to an authoritative grammar or refers to. In most cases, reference to an authoritative grammar or
dictionary of that language will be useful; in cases where no such dictionary of that language will be useful; in cases where no such
work exists, other well known works describing that language or in work exists, other well known works describing that language or in
that language may be appropriate. The subtag reviewer decides what that language MAY be appropriate. The subtag reviewer decides what
constitutes "good enough" reference material. This requirement is constitutes "good enough" reference material. This requirement is
not intended to exclude particular languages or dialects due to the not intended to exclude particular languages or dialects due to the
size of the speaker population or lack of a standardized orthography. size of the speaker population or lack of a standardized orthography.
Minority languages will be considered equally on their own merits. Minority languages will be considered equally on their own merits.
3.5 Possibilities for Registration 3.5 Possibilities for Registration
Possibilities for registration of subtags or information about Possibilities for registration of subtags or information about
subtags include: subtags include:
skipping to change at page 31, line 10 skipping to change at page 32, line 12
authorities, or which have never been attempted for registration authorities, or which have never been attempted for registration
with those authorities. If ISO 639 has previously rejected a with those authorities. If ISO 639 has previously rejected a
language for registration, it is reasonable to assume that there language for registration, it is reasonable to assume that there
must be additional very compelling evidence of need before it will must be additional very compelling evidence of need before it will
be registered in the IANA registry (to the extent that it is very be registered in the IANA registry (to the extent that it is very
unlikely that any subtags will be registered of this type). unlikely that any subtags will be registered of this type).
o Dialect or other divisions or variations within a language, its o Dialect or other divisions or variations within a language, its
orthography, writing system, regional or historical usage, orthography, writing system, regional or historical usage,
transliteration or other transformation, or distinguishing transliteration or other transformation, or distinguishing
variation may be registered as variant subtags. An example is the variation MAY be registered as variant subtags. An example is the
'scouse' subtag (the Scouse dialect of English). 'scouse' subtag (the Scouse dialect of English).
o The addition or maintenance of fields (generally of an o The addition or maintenance of fields (generally of an
informational nature) in Tag or Subtag records as described in informational nature) in Tag or Subtag records as described in
Section 3.1 and subject to the stability provisions in Section 3.1 and subject to the stability provisions in
Section 3.3. This includes descriptions; comments; deprecation Section 3.3. This includes descriptions; comments; deprecation
and preferred values for obsolete or withdrawn codes; or the and preferred values for obsolete or withdrawn codes; or the
addition of script or extlang information to primary language addition of script or extlang information to primary language
subtags. subtags.
skipping to change at page 32, line 36 skipping to change at page 33, line 38
3.6 Extensions and Extensions Namespace 3.6 Extensions and Extensions Namespace
Extension subtags are those introduced by single-letter subtags other Extension subtags are those introduced by single-letter subtags other
than 'x'. They are reserved for the generation of identifiers which than 'x'. They are reserved for the generation of identifiers which
contain a language component, and are compatible with applications contain a language component, and are compatible with applications
that understand language tags. For example, they might be used to that understand language tags. For example, they might be used to
define locale identifiers, which are generally based on language. define locale identifiers, which are generally based on language.
The structure and form of extensions are defined by this document so The structure and form of extensions are defined by this document so
that implementations can be created that are forward compatible with that implementations can be created that are forward compatible with
applications that may be created using single-letter subtags in the applications that might be created using single-letter subtags in the
future. In addition, defining a mechanism for maintaining single- future. In addition, defining a mechanism for maintaining single-
letter subtags will lend to the stability of this document by letter subtags will lend to the stability of this document by
reducing the likely need for future revisions or updates. reducing the likely need for future revisions or updates.
Allocation of a single-letter subtag shall take the form of an RFC Allocation of a single-letter subtag SHALL take the form of an RFC
defining the name, purpose, processes, and procedures for maintaining defining the name, purpose, processes, and procedures for maintaining
the subtags. The maintaining or registering authority, including the subtags. The maintaining or registering authority, including
name, contact email, discussion list email, and URL location of the name, contact email, discussion list email, and URL location of the
registry must be indicated clearly in the RFC. The RFC MUST specify registry MUST be indicated clearly in the RFC. The RFC MUST specify
or include each of the following: or include each of the following:
o The specification MUST reference the specific version or revision o The specification MUST reference the specific version or revision
of this document that governs its creation and MUST reference this of this document that governs its creation and MUST reference this
section of this document. section of this document.
o The specification and all subtags defined by the specification o The specification and all subtags defined by the specification
MUST follow the ABNF and other rules for the formation of tags and MUST follow the ABNF and other rules for the formation of tags and
subtags as defined in this document. In particular it MUST subtags as defined in this document. In particular it MUST
specify that case is not significant and that subtags MUST NOT specify that case is not significant and that subtags MUST NOT
skipping to change at page 33, line 34 skipping to change at page 34, line 35
once defined by a specification, MUST NOT be retracted or change once defined by a specification, MUST NOT be retracted or change
in meaning in any substantial way. in meaning in any substantial way.
o The specification MUST include in a separate section the o The specification MUST include in a separate section the
registration form reproduced in this section (below) to be used in registration form reproduced in this section (below) to be used in
registering the extension upon publication as an RFC. registering the extension upon publication as an RFC.
o IANA MUST be informed of changes to the contact information and o IANA MUST be informed of changes to the contact information and
URL for the specification. URL for the specification.
o Modified the latin-script requirement on the 'Description' field
so that "at least one Description field" must contain a Latin
transcription. (A.Phillips)
IANA will maintain a registry of allocated single-letter (singleton) IANA will maintain a registry of allocated single-letter (singleton)
subtags. This registry will use the record-jar format described by subtags. This registry will use the record-jar format described by
the ABNF in Section 3.1. Upon publication of an extension as an RFC, the ABNF in Section 3.1. Upon publication of an extension as an RFC,
the maintaining authority defined in the RFC must forward this the maintaining authority defined in the RFC MUST forward this
registration form to iesg@ietf.org, who will forward the request to registration form to iesg@ietf.org, who will forward the request to
iana@iana.org. The maintaining authority of the extension MUST iana@iana.org. The maintaining authority of the extension MUST
maintain the accuracy of the record by sending an updated full copy maintain the accuracy of the record by sending an updated full copy
of the record to iana@iana.org with the subject line "LANGUAGE TAG of the record to iana@iana.org with the subject line "LANGUAGE TAG
EXTENSION UPDATE" whenever content changes. Only the 'Comments', EXTENSION UPDATE" whenever content changes. Only the 'Comments',
'Contact_Email', 'Mailing_List', and 'URL' fields may be modified in 'Contact_Email', 'Mailing_List', and 'URL' fields MAY be modified in
these updates. these updates.
Failure to maintain this record, the corresponding registry, or meet Failure to maintain this record, the corresponding registry, or meet
other conditions imposed by this section of this document may be other conditions imposed by this section of this document MAY be
appealed to the IESG [RFC 2028] [9] under the same rules as other appealed to the IESG [RFC 2028] [9] under the same rules as other
IETF decisions (see [8]) and may result in the authority to maintain IETF decisions (see [8]) and MAY result in the authority to maintain
the extension being withdrawn or reassigned by the IESG. the extension being withdrawn or reassigned by the IESG.
%% %%
Identifier: Identifier:
Description: Description:
Comments: Comments:
Added: Added:
RFC: RFC:
Authority: Authority:
Contact_Email: Contact_Email:
Mailing_List: Mailing_List:
URL: URL:
skipping to change at page 34, line 18 skipping to change at page 35, line 17
Description: Description:
Comments: Comments:
Added: Added:
RFC: RFC:
Authority: Authority:
Contact_Email: Contact_Email:
Mailing_List: Mailing_List:
URL: URL:
%% %%
Figure 7: Format of Records in the Language Tag Extensions Registry Figure 8: Format of Records in the Language Tag Extensions Registry
'Identifier' contains the single letter subtag (singleton) assigned 'Identifier' contains the single letter subtag (singleton) assigned
to the extension. The Internet-Draft submitted to define the to the extension. The Internet-Draft submitted to define the
extension should specific which letter to use, although the IESG may extension SHOULD specify which letter to use, although the IESG MAY
change the assignment when approving the RFC. change the assignment when approving the RFC.
'Description' contains the name and description of the extension. 'Description' contains the name and description of the extension.
'Comments' is an optional field and may contain a broader description 'Comments' is an OPTIONAL field and MAY contain a broader description
of the extension. of the extension.
'Added' contains the date the RFC was published in the "full-date" 'Added' contains the date the RFC was published in the "full-date"
format specified in RFC 3339 [14]. For example: 2004-06-28 format specified in RFC 3339 [15]. For example: 2004-06-28
represents June 28, 2004, in the Gregorian calendar. represents June 28, 2004, in the Gregorian calendar.
'RFC' contains the RFC number assigned to the extension. 'RFC' contains the RFC number assigned to the extension.
'Authority' contains the name of the maintaining authority for the 'Authority' contains the name of the maintaining authority for the
extension. extension.
'Contact_Email' contains the email address used to contact the 'Contact_Email' contains the email address used to contact the
maintaining authority. maintaining authority.
skipping to change at page 35, line 10 skipping to change at page 36, line 7
The determination of whether an Internet-Draft meets the above The determination of whether an Internet-Draft meets the above
conditions and the decision to grant or withhold such authority rests conditions and the decision to grant or withhold such authority rests
solely with the IESG, and is subject to the normal review and appeals solely with the IESG, and is subject to the normal review and appeals
process associated with the RFC process. process associated with the RFC process.
Extension authors are strongly cautioned that many (including most Extension authors are strongly cautioned that many (including most
well-formed) processors will be unaware of any special relationships well-formed) processors will be unaware of any special relationships
or meaning inherent in the order of extension subtags. Extension or meaning inherent in the order of extension subtags. Extension
authors SHOULD avoid subtag relationships or canonicalization authors SHOULD avoid subtag relationships or canonicalization
mechanisms that interfere with matching or with length restrictions mechanisms that interfere with matching or with length restrictions
that may exist in common protocols where the extension is used. In that sometimes exist in common protocols where the extension is used.
particular, applications may truncate the subtags in doing matching In particular, applications MAY truncate the subtags in doing
or in fitting into limited lengths, so it is RECOMMENDED that the matching or in fitting into limited lengths, so it is RECOMMENDED
most significant information be in the most significant (left-most) that the most significant information be in the most significant
subtags, and that the specification gracefully handle truncated (left-most) subtags, and that the specification gracefully handle
subtags. truncated subtags.
When a language tag is to be used in a specific, known, protocol, it When a language tag is to be used in a specific, known, protocol, it
is RECOMMENDED that that the language tag not contain extensions not is RECOMMENDED that that the language tag not contain extensions not
supported by that protocol. In addition, it should be noted that supported by that protocol. In addition, note that some protocols
some protocols may impose upper limits on the length of the strings MAY impose upper limits on the length of the strings used to store or
used to store or transport the language tag. transport the language tag.
3.7 Initialization of the Registry 3.7 Initialization of the Registry
Upon publication of this document as a BCP, the Language Subtag Upon publication of this document as a BCP, the Language Subtag
Registry must be created and populated with the initial set of Registry MUST be created and populated with the initial set of
subtags. This includes converting the entries from the existing IANA subtags. This includes converting the entries from the existing IANA
language tag registry defined by RFC 3066 to the new format. This language tag registry defined by RFC 3066 to the new format. This
section defines the process for defining the new registry and section defines the process for defining the new registry and
performing the conversion of the old registry. performing the conversion of the old registry.
The impact on the IANA maintainers of the registry of this conversion The impact on the IANA maintainers of the registry of this conversion
will be a small increase in the frequency of new entries. The will be a small increase in the frequency of new entries. The
initial set of records represents no impact on IANA, since the work initial set of records represents no impact on IANA, since the work
to create it will be performed externally (as defined in this to create it will be performed externally (as defined in this
section). Future work will be limited to inserting or replacing section). Future work will be limited to inserting or replacing
skipping to change at page 36, line 23 skipping to change at page 37, line 21
deprecated. The 'Comments' field will contain the reason for the deprecated. The 'Comments' field will contain the reason for the
deprecation. The 'Preferred-Value' field will contain the tag that deprecation. The 'Preferred-Value' field will contain the tag that
replaces the value. For example, the tag "art-lojban" is deprecated replaces the value. For example, the tag "art-lojban" is deprecated
and will be placed in the grandfathered section. It's 'Deprecated' and will be placed in the grandfathered section. It's 'Deprecated'
field will contain the deprecation date (in this case "2003-09-02") field will contain the deprecation date (in this case "2003-09-02")
and the 'Preferred-Value' field the value "jbo". and the 'Preferred-Value' field the value "jbo".
Tags that are not deprecated and which contain subtags which are Tags that are not deprecated and which contain subtags which are
consistent with registration under the guidelines in this document consistent with registration under the guidelines in this document
will not automatically have a new subtag registration created for will not automatically have a new subtag registration created for
each eligible subtag. Interrested parties may use the registration each eligible subtag. Interrested parties MAY use the registration
process in Section 3.4 to register these subtags. If all of the process in Section 3.4 to register these subtags. If all of the
subtags in the original tag become fully defined by the resulting subtags in the original tag become fully defined by the resulting
registrations, then the original tag is superseded by this document. registrations, then the original tag is superseded by this document.
Such tags will have their record changed from type 'grandfathered' to Such tags will have their record changed from type 'grandfathered' to
type 'redundant' in the registry. For example, the subtag 'boont' type 'redundant' in the registry. For example, the subtag 'boont'
could be registered, resulting in the change of the grandfathered tag could be registered, resulting in the change of the grandfathered tag
"en-boont" to type redundant in the registry. "en-boont" to type redundant in the registry.
Tags that contain one or more subtags that do not match the valid Tags that contain one or more subtags that do not match the valid
registration pattern and which are not otherwise defined by this registration pattern and which are not otherwise defined by this
document will have records of type 'grandfathered' created in the document will have records of type 'grandfathered' created in the
registry. These records cannot become type 'redundant', but may have registry. These records cannot become type 'redundant', but MAY have
a 'Deprecated' and 'Prefered-Value' field added to them if a subtag a 'Deprecated' and 'Prefered-Value' field added to them if a subtag
assignment or combination of assignments renders the tag obsolete. assignment or combination of assignments renders the tag obsolete.
There will be a reasonable period in which the community may comment There MUST be a reasonable period in which the community can comment
on the proposed list entries, which SHALL be no less than four weeks on the proposed list entries, which SHALL be no less than four weeks
in length. At the completion of this period, the chair(s) will in length. At the completion of this period, the chair(s) will
notify iana@iana.org and the ltru and ietf-languages mail lists that notify iana@iana.org and the ltru and ietf-languages mail lists that
the task is complete and forward the necessary materials to IANA for the task is complete and forward the necessary materials to IANA for
publication. publication.
Registrations that are in process under the rules defined in RFC 3066 Registrations that are in process under the rules defined in RFC 3066
MAY be completed under the former rules, at the discretion of the MAY be completed under the former rules, at the discretion of the
language tag reviewer. Any new registrations submitted after the language tag reviewer. Any new registrations submitted after the
request for conversion of the registry MUST be rejected. request for conversion of the registry MUST be rejected.
All existing RFC 3066 language tag registrations will be maintained All existing RFC 3066 language tag registrations will be maintained
in perpetuity. in perpetuity.
Users of tags that are grandfathered should consider registering Users of tags that are grandfathered SHOULD consider registering
appropriate subtags in the IANA subtag registry (but are not required appropriate subtags in the IANA subtag registry (but are NOT REQUIRED
to). to).
UN numeric codes assigned to 'macro-geographical (continental)' or UN numeric codes assigned to 'macro-geographical (continental)' or
sub-regions not associated with an assigned ISO 3166 alpha-2 code are sub-regions not associated with an assigned ISO 3166 alpha-2 code are
defined in the IANA registry and are valid for use in language tags. defined in the IANA registry and are valid for use in language tags.
These codes MUST be added to the initial version of the registry. These codes MUST be added to the initial version of the registry.
The UN numeric codes for 'economic groupings' or 'other groupings', The UN numeric codes for 'economic groupings' or 'other groupings',
and the alphanumeric codes in Appendix X of the UN document MUST NOT and the alphanumeric codes in Appendix X of the UN document MUST NOT
be added to the registry. be added to the registry.
skipping to change at page 38, line 12 skipping to change at page 39, line 12
registry. Future changes or additions to this portion of the registry. Future changes or additions to this portion of the
registry are governed by the provisions of this document. registry are governed by the provisions of this document.
4. Formation and Processing of Language Tags 4. Formation and Processing of Language Tags
This section addresses how to use the registry with the language tag This section addresses how to use the registry with the language tag
format to choose, form and process language tags. format to choose, form and process language tags.
4.1 Choice of Language Tag 4.1 Choice of Language Tag
One may occasionally be faced with several possible tags for the same One is sometimes faced with the choice between several possible tags
body of text. for the same body of text.
Interoperability is best served when all users use the same language Interoperability is best served when all users use the same language
tag in order to represent the same language. If an application has tag in order to represent the same language. If an application has
requirements that make the rules here inapplicable, then that requirements that make the rules here inapplicable, then that
application risks damaging interoperability. It is strongly application risks damaging interoperability. It is strongly
RECOMMENDED that users not define their own rules for language tag RECOMMENDED that users not define their own rules for language tag
choice. choice.
Of particular note, many applications can benefit from the use of Of particular note, many applications can benefit from the use of
script subtags in language tags, as long as the use is consistent for script subtags in language tags, as long as the use is consistent for
a given context. Script subtags were not formally defined in RFC a given context. Script subtags were not formally defined in RFC
3066 and their use may affect matching and subtag identification by 3066 and their use can affect matching and subtag identification by
implementations of RFC 3066, as these subtags appear between the implementations of RFC 3066, as these subtags appear between the
primary language and region subtags. For example, if a user requests primary language and region subtags. For example, if a user requests
content in an implementation of Section 2.5 of RFC 3066 [23] using content in an implementation of Section 2.5 of RFC 3066 [24] using
the language range "en-US", content labeled "en-Latn-US" will not the language range "en-US", content labeled "en-Latn-US" will not
match the request. Therefore it is important to know when script match the request. Therefore it is important to know when script
subtags will customarily be used and when they should not be used. subtags will customarily be used and when they ought not be used. In
In the registry, the Suppress-Script field helps ensure greater the registry, the Suppress-Script field helps ensure greater
compatibility between the language tags generated according to the compatibility between the language tags generated according to the
rules in this document and language tags and tag processors or rules in this document and language tags and tag processors or
consumers based on RFC 3066 by defining when users should generally consumers based on RFC 3066 by defining when users SHOULD NOT include
not include a script subtag with a particular primary language a script subtag with a particular primary language subtag.
subtag.
Extended language subtags (type 'extlang' in the registry, see Extended language subtags (type 'extlang' in the registry, see
Section 3.1) also appear between the primary language and region Section 3.1) also appear between the primary language and region
subtags and are reserved for future standardization. Applications subtags and are reserved for future standardization. Applications
may benefit from their judicious use in forming language tags in the might benefit from their judicious use in forming language tags in
future and similar recommendations are expected to apply to their use the future. Similar recommendations are expected to apply to their
as apply to script subtags. use as apply to script subtags.
Standards, protocols and applications that reference this document Standards, protocols and applications that reference this document
normatively but apply different rules to the ones given in this normatively but apply different rules to the ones given in this
section MUST specify how the procedure varies from the one given section MUST specify how the procedure varies from the one given
here. here.
The choice of subtags used to form a language tag should be guided by The choice of subtags used to form a language tag SHOULD be guided by
the following rules: the following rules:
1. Use as precise a tag as possible, but no more specific than is 1. Use as precise a tag as possible, but no more specific than is
justified. Avoid using subtags that are not important for justified. Avoid using subtags that are not important for
distinguishing content in an application. distinguishing content in an application.
* For example, 'de' might suffice for tagging an email written * For example, 'de' might suffice for tagging an email written
in German, while "de-CH-1996" is probably unnecessarily in German, while "de-CH-1996" is probably unnecessarily
precise for such a task. precise for such a task.
skipping to change at page 39, line 24 skipping to change at page 40, line 24
the script adds some distinguishing information to the tag. The the script adds some distinguishing information to the tag. The
field 'Suppress-Script' in the primary language record in the field 'Suppress-Script' in the primary language record in the
registry indicates which script subtags do not add distinguishing registry indicates which script subtags do not add distinguishing
information for most applications. information for most applications.
* For example, the subtag 'Latn' should not be used with the * For example, the subtag 'Latn' should not be used with the
primary language 'en' because nearly all English documents are primary language 'en' because nearly all English documents are
written in the Latin script and it adds no distinguishing written in the Latin script and it adds no distinguishing
information. However, if a document were written in English information. However, if a document were written in English
mixing Latin script with another script such as Braille mixing Latin script with another script such as Braille
('Brai'), then it may be appropriate to choose to indicate ('Brai'), then it might be appropriate to choose to indicate
both scripts to aid in content selection, such as the both scripts to aid in content selection, such as the
application of a stylesheet. application of a stylesheet.
3. If a tag or subtag has a 'Preferred-Value' field in its registry 3. If a tag or subtag has a 'Preferred-Value' field in its registry
entry, then the value of that field SHOULD be used to form the entry, then the value of that field SHOULD be used to form the
language tag in preference to the tag or subtag in which the language tag in preference to the tag or subtag in which the
preferred value appears. preferred value appears.
* For example, use 'he' for Hebrew in preference to 'iw'. * For example, use 'he' for Hebrew in preference to 'iw'.
4. The 'und' (Undetermined) primary language subtag SHOULD NOT be 4. The 'und' (Undetermined) primary language subtag SHOULD NOT be
used to label content, even if the language is unknown. Omitting used to label content, even if the language is unknown. Omitting
the language tag altogether is preferred to using a tag with a the language tag altogether is preferred to using a tag with a
primary language subtag of 'und'. The 'und' subtag may be useful primary language subtag of 'und'. The 'und' subtag MAY be useful
for protocols that require a language tag to be provided. The for protocols that require a language tag to be provided. The
'und' subtag may also be useful when matching language tags in 'und' subtag MAY also be useful when matching language tags in
certain situations. certain situations.
5. The 'mul' (Multiple) primary language subtag SHOULD NOT be used 5. The 'mul' (Multiple) primary language subtag SHOULD NOT be used
whenever the protocol allows the separate tags for multiple whenever the protocol allows the separate tags for multiple
languages, as is the case for the Content-Language header in languages, as is the case for the Content-Language header in
HTTP. The 'mul' subtag conveys little useful information: HTTP. The 'mul' subtag conveys little useful information:
content in multiple languages should individually tag the content in multiple languages SHOULD individually tag the
languages where they appear or otherwise indicate the actual languages where they appear or otherwise indicate the actual
language in preference to the 'mul' subtag. language in preference to the 'mul' subtag.
6. The same variant subtag SHOULD NOT be used more than once within 6. The same variant subtag SHOULD NOT be used more than once within
a language tag. a language tag.
* For example, do not use "en-GB-scouse-scouse". * For example, do not use "en-GB-scouse-scouse".
To ensure consistent backward compatibility, this document contains To ensure consistent backward compatibility, this document contains
several provisions to account for potential instability in the several provisions to account for potential instability in the
skipping to change at page 40, line 26 skipping to change at page 41, line 26
signed or otherwise signaled) by human beings for communication of signed or otherwise signaled) by human beings for communication of
information to other human beings. Computer languages such as information to other human beings. Computer languages such as
programming languages are explicitly excluded. programming languages are explicitly excluded.
If a language tag B contains language tag A as a prefix, then B is If a language tag B contains language tag A as a prefix, then B is
typically "narrower" or "more specific" than A. For example, "zh- typically "narrower" or "more specific" than A. For example, "zh-
Hant-TW" is more specific than "zh-Hant". Hant-TW" is more specific than "zh-Hant".
This relationship is not guaranteed in all cases: specifically, This relationship is not guaranteed in all cases: specifically,
languages that begin with the same sequence of subtags are NOT languages that begin with the same sequence of subtags are NOT
guaranteed to be mutually intelligible, although they may be. For guaranteed to be mutually intelligible, although they might be. For
example, the tag "az" shares a prefix with both "az-Latn" example, the tag "az" shares a prefix with both "az-Latn"
(Azerbaijani written using the Latin script) and "az-Cyrl" (Azerbaijani written using the Latin script) and "az-Cyrl"
(Azerbaijani written using the Cyrillic script). A person fluent in (Azerbaijani written using the Cyrillic script). A person fluent in
one script may not be able to read the other, even though the text one script might not be able to read the other, even though the text
might be identical. Content tagged as "az" most probably is written might be identical. Content tagged as "az" most probably is written
in just one script and thus might not be intelligible to a reader in just one script and thus might not be intelligible to a reader
familiar with the other script. familiar with the other script.
The relationship between the tag and the information it relates to is The relationship between the tag and the information it relates to is
defined by the standard describing the context in which it appears. defined by the standard describing the context in which it appears.
Accordingly, this section can only give possible examples of its Accordingly, this section can only give possible examples of its
usage. usage.
o For a single information object, the associated language tags o For a single information object, the associated language tags
might be interpreted as the set of languages that is required for might be interpreted as the set of languages that is necessary for
a complete comprehension of the complete object. Example: Plain a complete comprehension of the complete object. Example: Plain
text documents. text documents.
o For an aggregation of information objects, the associated language o For an aggregation of information objects, the associated language
tags could be taken as the set of languages used inside components tags could be taken as the set of languages used inside components
of that aggregation. Examples: Document stores and libraries. of that aggregation. Examples: Document stores and libraries.
o For information objects whose purpose is to provide alternatives, o For information objects whose purpose is to provide alternatives,
the associated language tags could be regarded as a hint that the the associated language tags could be regarded as a hint that the
content is provided in several languages, and that one has to content is provided in several languages, and that one has to
skipping to change at page 41, line 23 skipping to change at page 42, line 23
Norwegian document; the Norwegian-speaking user could then access Norwegian document; the Norwegian-speaking user could then access
a French-Norwegian dictionary to find out what the marked section a French-Norwegian dictionary to find out what the marked section
meant. If the user were listening to that document through a meant. If the user were listening to that document through a
speech synthesis interface, this formation could be used to signal speech synthesis interface, this formation could be used to signal
the synthesizer to appropriately apply French text-to-speech the synthesizer to appropriately apply French text-to-speech
pronunciation rules to that span of text, instead of applying the pronunciation rules to that span of text, instead of applying the
inappropriate Norwegian rules. inappropriate Norwegian rules.
4.3 Canonicalization of Language Tags 4.3 Canonicalization of Language Tags
Since a particular language tag may be used in many processes, Since a particular language tag is sometimes used by many processes,
language tags SHOULD always be created or generated in a canonical language tags SHOULD always be created or generated in a canonical
form. form.
A language tag is in canonical form when: A language tag is in canonical form when:
1. The tag is well-formed according the rules in Section 2.1 and 1. The tag is well-formed according the rules in Section 2.1 and
Section 2.2. Section 2.2.
2. Subtags of type 'Region' that have a Preferred-Value mapping in 2. Subtags of type 'Region' that have a Preferred-Value mapping in
the IANA registry (see Section 3.1) SHOULD be replaced with their the IANA registry (see Section 3.1) SHOULD be replaced with their
skipping to change at page 42, line 24 skipping to change at page 43, line 24
Hebrides) is not canonical because the 'NH' subtag has a canonical Hebrides) is not canonical because the 'NH' subtag has a canonical
mapping to 'VU' (Vanuatu), although the tag "en-NH" maintains its mapping to 'VU' (Vanuatu), although the tag "en-NH" maintains its
validity. validity.
Canonicalization of language tags does not imply anything about the Canonicalization of language tags does not imply anything about the
use of upper or lowercase letters when processing or comparing use of upper or lowercase letters when processing or comparing
subtags (and as described in Section 2.1). All comparisons MUST be subtags (and as described in Section 2.1). All comparisons MUST be
performed in a case-insensitive manner. performed in a case-insensitive manner.
When performing canonicalization of language tags, processors MAY When performing canonicalization of language tags, processors MAY
optionally regularize the case of the subtags, following the case regularize the case of the subtags (that is, this process is
used in the registry. Note that this corresponds to the following OPTIONAL), following the case used in the registry. Note that this
casing rules: uppercase all non-initial two-letter subtags; titlecase corresponds to the following casing rules: uppercase all non-initial
all non-initial four-letter subtags; lowercase everything else. two-letter subtags; titlecase all non-initial four-letter subtags;
lowercase everything else.
Note: Case folding of ASCII letters in certain locales, unless Note: Case folding of ASCII letters in certain locales, unless
carefully handled, may produce non-ASCII character values. The carefully handled, sometimes produces non-ASCII character values.
Unicode Character Database file "SpecialCasing.txt" defines the The Unicode Character Database file "SpecialCasing.txt" defines the
specific cases that are known to cause problems with this. In specific cases that are known to cause problems with this. In
particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is
uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE). uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE).
Implementers should specify a locale-neutral casing operation to Implementers SHOULD specify a locale-neutral casing operation to
ensure that case folding of subtags does not produce this value, ensure that case folding of subtags does not produce this value,
which is illegal in language tags. For example, if one were to which is illegal in language tags. For example, if one were to
uppercase the region subtag 'in' using Turkish locale rules, the uppercase the region subtag 'in' using Turkish locale rules, the
sequence U+0130 U+004E would result instead of the expected 'IN'. sequence U+0130 U+004E would result instead of the expected 'IN'.
Note: if the field 'Deprecated' appears in a registry record without Note: if the field 'Deprecated' appears in a registry record without
an accompanying 'Preferred-Value' field, then that tag or subtag is an accompanying 'Preferred-Value' field, then that tag or subtag is
deprecated without a replacement. Validating processors SHOULD NOT deprecated without a replacement. Validating processors SHOULD NOT
generate tags that include these values, although the values are generate tags that include these values, although the values are
canonical when they appear in a language tag. canonical when they appear in a language tag.
An extension MUST define any relationships that may exist between the An extension MUST define any relationships that exist between the
various subtags in the extension and thus MAY define an alternate various subtags in the extension and thus MAY define an alternate
canonicalization scheme for the extension's subtags. Extensions MAY canonicalization scheme for the extension's subtags. Extensions MAY
define how the order of the extension's subtags are interpreted. For define how the order of the extension's subtags are interpreted. For
example, an extension could define that its subtags are in canonical example, an extension could define that its subtags are in canonical
order when the subtags are placed into ASCII order: that is, "en-a- order when the subtags are placed into ASCII order: that is, "en-a-
aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension might aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension might
define that the order of the subtags influences their semantic define that the order of the subtags influences their semantic
meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b- meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b-
aaa-bbb-ccc"). However, extension specifications SHOULD be designed aaa-bbb-ccc"). However, extension specifications SHOULD be designed
so that they are tolerant of the typical processes described in so that they are tolerant of the typical processes described in
Section 3.6. Section 3.6.
4.4 Considerations for Private Use Subtags 4.4 Considerations for Private Use Subtags
Private-use subtags require private agreement between the parties Private-use subtags require private agreement between the parties
that intend to use or exchange language tags that use them and great that intend to use or exchange language tags that use them and great
caution should be used in employing them in content or protocols caution SHOULD be used in employing them in content or protocols
intended for general use. Private-use subtags are simply useless for intended for general use. Private-use subtags are simply useless for
information exchange without prior arrangement. information exchange without prior arrangement.
The value and semantic meaning of private-use tags and of the subtags The value and semantic meaning of private-use tags and of the subtags
used within such a language tag are not defined by this document. used within such a language tag are not defined by this document.
The use of subtags defined in the IANA registry as having a specific The use of subtags defined in the IANA registry as having a specific
private use meaning convey more information that a purely private use private use meaning convey more information that a purely private use
tag prefixed by the singleton subtag 'x'. For applications this tag prefixed by the singleton subtag 'x'. For applications this
additional information may be useful. additional information MAY be useful.
For example, the region subtags 'AA', 'ZZ' and in the ranges For example, the region subtags 'AA', 'ZZ' and in the ranges
'QM'-'QZ' and 'XA'-'XZ' (derived from ISO 3166 private use codes) may 'QM'-'QZ' and 'XA'-'XZ' (derived from ISO 3166 private use codes) MAY
be used to form a language tag. A tag such as "zh-Hans-XQ" conveys a be used to form a language tag. A tag such as "zh-Hans-XQ" conveys a
great deal of public, interchangeable information about the language great deal of public, interchangeable information about the language
material (that it is Chinese in the simplified Chinese script and is material (that it is Chinese in the simplified Chinese script and is
suitable for some geographic region 'XQ'). While the precise suitable for some geographic region 'XQ'). While the precise
geographic region is not known outside of private agreement, the tag geographic region is not known outside of private agreement, the tag
conveys far more information than an opaque tag such as "x-someLang", conveys far more information than an opaque tag such as "x-someLang",
which contains no information about the language subtag or script which contains no information about the language subtag or script
subtag outside of the private agreement. subtag outside of the private agreement.
However, in some cases content tagged with private use subtags may However, in some cases content tagged with private use subtags MAY
interact with other systems in a different and possibly unsuitable interact with other systems in a different and possibly unsuitable
manner compared to tags that use opaque, privately defined subtags, manner compared to tags that use opaque, privately defined subtags,
so the choice of the best approach may depend on the particular so the choice of the best approach sometimes depends on the
domain in question. particular domain in question.
5. IANA Considerations 5. IANA Considerations
This section deals with the processes and requirements necessary for This section deals with the processes and requirements necessary for
IANA to undertake to maintain the rsubtag and extension registries as IANA to undertake to maintain the rsubtag and extension registries as
defined by this document and in accordance with the requirements of defined by this document and in accordance with the requirements of
RFC 2434 [11]. RFC 2434 [11].
The impact on the IANA maintainers of the two registries defined by The impact on the IANA maintainers of the two registries defined by
this document will be a small increase in the frequency of new this document will be a small increase in the frequency of new
skipping to change at page 44, line 36 skipping to change at page 45, line 36
Future work on the Language Subtag Registry will be limited to Future work on the Language Subtag Registry will be limited to
inserting or replacing whole records preformatted for IANA by the inserting or replacing whole records preformatted for IANA by the
Language Subtag Reviewer as described in Section 3.2 of this Language Subtag Reviewer as described in Section 3.2 of this
document. Each record will be sent to iana@iana.org with a subject document. Each record will be sent to iana@iana.org with a subject
line indicating whether the enclosed record is an insertion (of a new line indicating whether the enclosed record is an insertion (of a new
record) or a replacment of an existing record which has a Type and record) or a replacment of an existing record which has a Type and
Subtag (or Tag) field that exactly matches the record sent. Records Subtag (or Tag) field that exactly matches the record sent. Records
cannot be deleted from the registry. cannot be deleted from the registry.
The Language Tag Extensions registry will also be generated and sent The Language Tag Extensions registry will also be generated and sent
to IANA as described in Section 3.6. This registry may contain at to IANA as described in Section 3.6. This registry can contain at
most 35 records and thus changes to this registry are expected to be most 35 records and thus changes to this registry are expected to be
very infrequent. very infrequent.
Future work by IANA on the Language Tag Extensions Registry is Future work by IANA on the Language Tag Extensions Registry is
limited to two cases. First, the IESG may request that new records limited to two cases. First, the IESG MAY request that new records
be inserted into this registry from time to time. These requests be inserted into this registry from time to time. These requests
will include the record to insert in the exact format described in will include the record to insert in the exact format described in
Section 3.6. In addition, there may be occasional requests from the Section 3.6. In addition, there MAY be occasional requests from the
maintaining authority for a specific extension to update the contact maintaining authority for a specific extension to update the contact
information or URLs in the record. These requests MUST include the information or URLs in the record. These requests MUST include the
complete, updated record. IANA is not responsible for validating the complete, updated record. IANA is not responsible for validating the
information provided, only that it is properly formatted. It should information provided, only that it is properly formatted. It should
reasonably be seen to come from the maintaining authority named in reasonably be seen to come from the maintaining authority named in
the record present in the registry. the record present in the registry.
6. Security Considerations 6. Security Considerations
Language tags used in content negotiation, like any other information Language tags used in content negotiation, like any other information
exchanged on the Internet, may be a source of concern because they exchanged on the Internet, might be a source of concern because they
may be used to infer the nationality of the sender, and thus identify might be used to infer the nationality of the sender, and thus
potential targets for surveillance. identify potential targets for surveillance.
This is a special case of the general problem that anything sent is This is a special case of the general problem that anything sent is
visible to the receiving party and possibly to third parties as well. visible to the receiving party and possibly to third parties as well.
It is useful to be aware that such concerns can exist in some cases. It is useful to be aware that such concerns can exist in some cases.
The evaluation of the exact magnitude of the threat, and any possible The evaluation of the exact magnitude of the threat, and any possible
countermeasures, is left to each application protocol (see BCP 72, countermeasures, is left to each application protocol (see BCP 72,
RFC 3552 [15] for best current practice guidance on security threats RFC 3552 [16] for best current practice guidance on security threats
and defenses). and defenses).
Since there is no limit to the number of variant, private use, and Since there is no limit to the number of variant, private use, and
extension subtags, and consequently no limit on the possible length extension subtags, and consequently no limit on the possible length
of a tag, implementations need to guard against buffer overflow of a tag, implementations need to guard against buffer overflow
attacks. See section Section 2.1.1 for details on language tag attacks. See section Section 2.1.1 for details on language tag
truncation, which can occur as a consequence of defenses against truncation, which can occur as a consequence of defenses against
buffer overflow. buffer overflow.
Although the specification of valid subtags for an extension (see: Although the specification of valid subtags for an extension (see:
skipping to change at page 46, line 18 skipping to change at page 47, line 18
characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most
character sets, so the composition of language tags should not have character sets, so the composition of language tags should not have
any character set issues. any character set issues.
Rendering of characters based on the content of a language tag is not Rendering of characters based on the content of a language tag is not
addressed in this memo. Historically, some languages have relied on addressed in this memo. Historically, some languages have relied on
the use of specific character sets or other information in order to the use of specific character sets or other information in order to
infer how a specific character should be rendered (notably this infer how a specific character should be rendered (notably this
applies to language and culture specific variations of Han ideographs applies to language and culture specific variations of Han ideographs
as used in Japanese, Chinese, and Korean). When language tags are as used in Japanese, Chinese, and Korean). When language tags are
applied to spans of text, rendering engines may use that information applied to spans of text, rendering engines can use that information
in deciding which font to use in the absence of other information, in deciding which font to use in the absence of other information,
particularly where languages with distinct writing traditions use the particularly where languages with distinct writing traditions use the
same characters. same characters.
8. Changes from RFC 3066 8. Changes from RFC 3066
The main goals for this revision of language tags were the following: The main goals for this revision of language tags were the following:
*Compatibility.* All valid RFC 3066 language tags (including those *Compatibility.* All valid RFC 3066 language tags (including those
in the IANA registry) remain valid in this specification. Thus in the IANA registry) remain valid in this specification. Thus
there is complete backward compatibility of this specification with there is complete backward compatibility of this specification with
existing content. In addition, this document defines language tags existing content. In addition, this document defines language tags
in such as way as to ensure future compatibility, and processors in such as way as to ensure future compatibility, and processors
based solely on the RFC 3066 ABNF (such as those described in XML based solely on the RFC 3066 ABNF (such as those described in XML
Schema version 1.0 [19]) will be able to process tags described by Schema version 1.0 [20]) will be able to process tags described by
this document. this document.
*Stability.* Because of the changes in underlying ISO standards, a *Stability.* Because of the changes in underlying ISO standards, a
valid RFC 3066 language tag may become invalid (or have its meaning valid RFC 3066 language tag may become invalid (or have its meaning
change) at a later date. With so much of the world's computing change) at a later date. With so much of the world's computing
infrastructure dependent on language tags, this is simply infrastructure dependent on language tags, this is simply
unacceptable: it invalidates content that may have an extensive unacceptable: it invalidates content that may have an extensive
shelf-life. In this specification, once a language tag is valid, it shelf-life. In this specification, once a language tag is valid, it
remains valid forever. Previously, there was no way to determine remains valid forever. Previously, there was no way to determine
when two tags were equivalent. This specification provides a stable when two tags were equivalent. This specification provides a stable
skipping to change at page 51, line 5 skipping to change at page 51, line 33
the requirements for validating processors to require the prefix the requirements for validating processors to require the prefix
with variants and extlangs. (#1018) (J.Cowan, F.Ellerman) with variants and extlangs. (#1018) (J.Cowan, F.Ellerman)
o Added notes about when variants may be used together and the o Added notes about when variants may be used together and the
relationship of the 'Prefix' field to this in Section 2.2.5 relationship of the 'Prefix' field to this in Section 2.2.5
(A.Phillips) (A.Phillips)
o Specified that 'Prefix' fields may be added only to 'variant' o Specified that 'Prefix' fields may be added only to 'variant'
subtag records and not to 'extlang' records. (J.Cowan) subtag records and not to 'extlang' records. (J.Cowan)
o Converted lowercase RFC 2119 words to their RFC 2119 normative
equivalent. A few exceptions remain (where the words functioned
in a non-normative fashion). (I.McDonald)
o Rewrote Section 2.1.1 so that it deals with a canonical minimum
maximum length, etc. (#944)
9. References 9. References
9.1 Normative References 9.1 Normative References
[1] International Organization for Standardization, "ISO 639- [1] International Organization for Standardization, "ISO 639-
1:2002, Codes for the representation of names of languages -- 1:2002, Codes for the representation of names of languages --
Part 1: Alpha-2 code", ISO Standard 639, 2002. Part 1: Alpha-2 code", ISO Standard 639, 2002.
[2] International Organization for Standardization, "ISO 639-2:1998 [2] International Organization for Standardization, "ISO 639-2:1998
- Codes for the representation of names of languages -- Part 2: - Codes for the representation of names of languages -- Part 2:
skipping to change at page 52, line 13 skipping to change at page 53, line 13
Considerations Section in RFCs", BCP 26, RFC 2434, Considerations Section in RFCs", BCP 26, RFC 2434,
October 1998. October 1998.
[12] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 10646", [12] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 10646",
RFC 2781, February 2000. RFC 2781, February 2000.
[13] Carpenter, B., Baker, F., and M. Roberts, "Memorandum of [13] Carpenter, B., Baker, F., and M. Roberts, "Memorandum of
Understanding Concerning the Technical Work of the Internet Understanding Concerning the Technical Work of the Internet
Assigned Numbers Authority", RFC 2860, June 2000. Assigned Numbers Authority", RFC 2860, June 2000.
[14] Klyne, G. and C. Newman, "Date and Time on the Internet: [14] Resnick, P., "Internet Message Format", RFC 2822, April 2001.
[15] Klyne, G. and C. Newman, "Date and Time on the Internet:
Timestamps", RFC 3339, July 2002. Timestamps", RFC 3339, July 2002.
[15] Rescorla, E. and B. Korver, "Guidelines for Writing RFC Text on [16] Rescorla, E. and B. Korver, "Guidelines for Writing RFC Text on
Security Considerations", BCP 72, RFC 3552, July 2003. Security Considerations", BCP 72, RFC 3552, July 2003.
9.2 Informative References 9.2 Informative References
[16] ISO 639 Joint Advisory Committee, "ISO 639 Joint Advisory [17] ISO 639 Joint Advisory Committee, "ISO 639 Joint Advisory
Committee: Working principles for ISO 639 maintenance", Committee: Working principles for ISO 639 maintenance",
March 2000, March 2000,
<http://www.loc.gov/standards/iso639-2/iso639jac_n3r.html>. <http://www.loc.gov/standards/iso639-2/iso639jac_n3r.html>.
[17] Raymond, E., "The Art of Unix Programming", 2003. [18] Raymond, E., "The Art of Unix Programming", 2003.
[18] Bray (et al), T., "Extensible Markup Language (XML) 1.0", [19] Bray (et al), T., "Extensible Markup Language (XML) 1.0",
02 2004. 02 2004.
[19] Biron, P., Ed. and A. Malhotra, Ed., "XML Schema Part 2: [20] Biron, P., Ed. and A. Malhotra, Ed., "XML Schema Part 2:
Datatypes Second Edition", 10 2004, < Datatypes Second Edition", 10 2004, <
http://www.w3.org/TR/xmlschema-2/>. http://www.w3.org/TR/xmlschema-2/>.
[20] Unicode Consortium, "The Unicode Consortium. The Unicode [21] Unicode Consortium, "The Unicode Consortium. The Unicode
Standard, Version 4.1.0, defined by: The Unicode Standard, Standard, Version 4.1.0, defined by: The Unicode Standard,
Version 4.0 (Boston, MA, Addison-Wesley, 2003. ISBN 0-321- Version 4.0 (Boston, MA, Addison-Wesley, 2003. ISBN 0-321-
18578-1), as amended by Unicode 4.0.1 18578-1), as amended by Unicode 4.0.1
(http://www.unicode.org/versions/Unicode4.0.1) and by Unicode (http://www.unicode.org/versions/Unicode4.0.1) and by Unicode
4.1.0 (http://www.unicode.org/versions/Unicode4.1.0).", 4.1.0 (http://www.unicode.org/versions/Unicode4.1.0).",
March 2005. March 2005.
[21] Alvestrand, H., "Tags for the Identification of Languages", [22] Alvestrand, H., "Tags for the Identification of Languages",
RFC 1766, March 1995. RFC 1766, March 1995.
[22] Freed, N. and K. Moore, "MIME Parameter Value and Encoded Word [23] Freed, N. and K. Moore, "MIME Parameter Value and Encoded Word
Extensions: Character Sets, Languages, and Continuations", Extensions: Character Sets, Languages, and Continuations",
RFC 2231, November 1997. RFC 2231, November 1997.
[23] Alvestrand, H., "Tags for the Identification of Languages", [24] Alvestrand, H., "Tags for the Identification of Languages",
BCP 47, RFC 3066, January 2001. BCP 47, RFC 3066, January 2001.
Authors' Addresses Authors' Addresses
Addison Phillips (editor) Addison Phillips (editor)
Quest Software Quest Software
Email: addison.phillips@quest.com Email: addison.phillips@quest.com
Mark Davis (editor) Mark Davis (editor)
skipping to change at page 55, line 48 skipping to change at page 56, line 48
en-scouse (Scouse dialect of English) en-scouse (Scouse dialect of English)
Language-Region-Variant: Language-Region-Variant:
en-GB-scouse (Scouse dialect of English as used in the UK) en-GB-scouse (Scouse dialect of English as used in the UK)
Language-Script-Region-Variant: Language-Script-Region-Variant:
sl-Latn-IT-nedis (Nadiza dialect of Slovenian written using the sl-Latn-IT-nedis (Nadiza dialect of Slovenian written using the
Latin script as used in Italy. Note that this tag is not Latin script as used in Italy. Note that this tag is NOT
recommended because subtag 'sl' has a Suppress-Script value of RECOMMENDED because subtag 'sl' has a Suppress-Script value of
'Latn') 'Latn')
Language-Region: Language-Region:
de-DE (German for Germany) de-DE (German for Germany)
en-US (English as used in the United States) en-US (English as used in the United States)
es-419 (Spanish for Latin America and Caribbean region using the es-419 (Spanish for Latin America and Caribbean region using the
UN region code) UN region code)
Private-use subtags: Private-use subtags:
de-CH-x-phonebk de-CH-x-phonebk
az-Arab-x-AZE-derbend az-Arab-x-AZE-derbend
Extended language subtags (examples ONLY: extended languages must be Extended language subtags (examples ONLY: extended languages MUST be
defined by revision or update to this document): defined by revision or update to this document):
zh-min zh-min
zh-min-nan-Hant-CN zh-min-nan-Hant-CN
Private-use registry values: Private-use registry values:
x-whatever (private use using the singleton 'x') x-whatever (private use using the singleton 'x')
qaa-Qaaa-QM-x-southern (all private tags) qaa-Qaaa-QM-x-southern (all private tags)
de-Qaaa (German, with a private script) de-Qaaa (German, with a private script)
sr-Latn-QM (Serbian, Latin-script, private region) sr-Latn-QM (Serbian, Latin-script, private region)
sr-Qaaa-CS (Serbian, private script, for Serbia and Montenegro) sr-Qaaa-CS (Serbian, private script, for Serbia and Montenegro)
Tags that use extensions (examples ONLY: extensions must be defined Tags that use extensions (examples ONLY: extensions MUST be defined
by revision or update to this document or by RFC): by revision or update to this document or by RFC):
en-US-u-islamCal en-US-u-islamCal
zh-CN-a-myExt-x-private zh-CN-a-myExt-x-private
en-a-myExt-b-another en-a-myExt-b-another
Some Invalid Tags: Some Invalid Tags:
skipping to change at page 61, line 12 skipping to change at page 62, line 12
Tag: az-Arab Tag: az-Arab
Description: Azerbaijani in Arabic script Description: Azerbaijani in Arabic script
Added: 2003-05-30 Added: 2003-05-30
%% %%
Type: redundant Type: redundant
Tag: az-Cyrl Tag: az-Cyrl
Description: Azerbaijani in Cyrillic script Description: Azerbaijani in Cyrillic script
Added: 2003-05-30 Added: 2003-05-30
%% %%
Figure 8: Example of the Registry Format Figure 9: Example of the Registry Format
Intellectual Property Statement Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be on the procedures with respect to rights in RFC documents can be
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/