draft-ietf-ltru-registry-13.txt   draft-ietf-ltru-registry-14.txt 
Network Working Group A. Phillips, Ed. Network Working Group A. Phillips, Ed.
Internet-Draft Quest Software Internet-Draft Quest Software
Obsoletes: 3066 (if approved) M. Davis, Ed. Obsoletes: 3066 (if approved) M. Davis, Ed.
Expires: March 26, 2006 IBM Expires: April 17, 2006 IBM
September 22, 2005 October 14, 2005
Tags for Identifying Languages Tags for Identifying Languages
draft-ietf-ltru-registry-13 draft-ietf-ltru-registry-14
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on March 26, 2006. This Internet-Draft will expire on April 17, 2006.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2005). Copyright (C) The Internet Society (2005).
Abstract Abstract
This document describes the structure, content, construction, and This document describes the structure, content, construction, and
semantics of language tags for use in cases where it is desirable to semantics of language tags for use in cases where it is desirable to
indicate the language used in an information object. It also indicate the language used in an information object. It also
describes how to register values for use in language tags and the describes how to register values for use in language tags and the
creation of user defined extensions for private interchange. creation of user defined extensions for private interchange.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. The Language Tag . . . . . . . . . . . . . . . . . . . . . . . 4 2. The Language Tag . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Language Subtag Sources and Interpretation . . . . . . . . 6 2.2. Language Subtag Sources and Interpretation . . . . . . . . 6
2.2.1 Primary Language Subtag . . . . . . . . . . . . . . . 8 2.2.1. Primary Language Subtag . . . . . . . . . . . . . . . 8
2.2.2 Extended Language Subtags . . . . . . . . . . . . . . 10 2.2.2. Extended Language Subtags . . . . . . . . . . . . . . 10
2.2.3 Script Subtag . . . . . . . . . . . . . . . . . . . . 10 2.2.3. Script Subtag . . . . . . . . . . . . . . . . . . . . 10
2.2.4 Region Subtag . . . . . . . . . . . . . . . . . . . . 11 2.2.4. Region Subtag . . . . . . . . . . . . . . . . . . . . 11
2.2.5 Variant Subtags . . . . . . . . . . . . . . . . . . . 13 2.2.5. Variant Subtags . . . . . . . . . . . . . . . . . . . 13
2.2.6 Extension Subtags . . . . . . . . . . . . . . . . . . 14 2.2.6. Extension Subtags . . . . . . . . . . . . . . . . . . 14
2.2.7 Private Use Subtags . . . . . . . . . . . . . . . . . 15 2.2.7. Private Use Subtags . . . . . . . . . . . . . . . . . 15
2.2.8 Pre-Existing RFC 3066 Registrations . . . . . . . . . 16 2.2.8. Pre-Existing RFC 3066 Registrations . . . . . . . . . 16
2.2.9 Classes of Conformance . . . . . . . . . . . . . . . . 16 2.2.9. Classes of Conformance . . . . . . . . . . . . . . . . 16
3. Registry Format and Maintenance . . . . . . . . . . . . . . . 18 3. Registry Format and Maintenance . . . . . . . . . . . . . . . 18
3.1 Format of the IANA Language Subtag Registry . . . . . . . 18 3.1. Format of the IANA Language Subtag Registry . . . . . . . 18
3.2 Language Subtag Reviewer . . . . . . . . . . . . . . . . . 23 3.2. Language Subtag Reviewer . . . . . . . . . . . . . . . . . 23
3.3 Maintenance of the Registry . . . . . . . . . . . . . . . 24 3.3. Maintenance of the Registry . . . . . . . . . . . . . . . 24
3.4 Stability of IANA Registry Entries . . . . . . . . . . . . 25 3.4. Stability of IANA Registry Entries . . . . . . . . . . . . 25
3.5 Registration Procedure for Subtags . . . . . . . . . . . . 28 3.5. Registration Procedure for Subtags . . . . . . . . . . . . 28
3.6 Possibilities for Registration . . . . . . . . . . . . . . 31 3.6. Possibilities for Registration . . . . . . . . . . . . . . 31
3.7 Extensions and Extensions Registry . . . . . . . . . . . . 33 3.7. Extensions and Extensions Registry . . . . . . . . . . . . 33
3.8 Initialization of the Registries . . . . . . . . . . . . . 36 3.8. Initialization of the Registries . . . . . . . . . . . . . 36
4. Formation and Processing of Language Tags . . . . . . . . . . 38 4. Formation and Processing of Language Tags . . . . . . . . . . 38
4.1 Choice of Language Tag . . . . . . . . . . . . . . . . . . 38 4.1. Choice of Language Tag . . . . . . . . . . . . . . . . . . 38
4.2 Meaning of the Language Tag . . . . . . . . . . . . . . . 40 4.2. Meaning of the Language Tag . . . . . . . . . . . . . . . 40
4.3 Length Considerations . . . . . . . . . . . . . . . . . . 41 4.3. Length Considerations . . . . . . . . . . . . . . . . . . 41
4.3.1 Working with Limited Buffer Sizes . . . . . . . . . . 41 4.3.1. Working with Limited Buffer Sizes . . . . . . . . . . 41
4.3.2 Truncation of Language Tags . . . . . . . . . . . . . 43 4.3.2. Truncation of Language Tags . . . . . . . . . . . . . 43
4.4 Canonicalization of Language Tags . . . . . . . . . . . . 43 4.4. Canonicalization of Language Tags . . . . . . . . . . . . 43
4.5 Considerations for Private Use Subtags . . . . . . . . . . 45 4.5. Considerations for Private Use Subtags . . . . . . . . . . 45
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 47 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 47
5.1 Language Subtag Registry . . . . . . . . . . . . . . . . . 47 5.1. Language Subtag Registry . . . . . . . . . . . . . . . . . 47
5.2 Extensions Registry . . . . . . . . . . . . . . . . . . . 48 5.2. Extensions Registry . . . . . . . . . . . . . . . . . . . 48
6. Security Considerations . . . . . . . . . . . . . . . . . . . 49 6. Security Considerations . . . . . . . . . . . . . . . . . . . 49
7. Character Set Considerations . . . . . . . . . . . . . . . . . 50 7. Character Set Considerations . . . . . . . . . . . . . . . . . 50
8. Changes from RFC 3066 . . . . . . . . . . . . . . . . . . . . 51 8. Changes from RFC 3066 . . . . . . . . . . . . . . . . . . . . 51
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 54 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 54
9.1 Normative References . . . . . . . . . . . . . . . . . . . 54 9.1. Normative References . . . . . . . . . . . . . . . . . . . 54
9.2 Informative References . . . . . . . . . . . . . . . . . . 55 9.2. Informative References . . . . . . . . . . . . . . . . . . 55
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 56 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 57
A. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 57 Appendix B. Examples of Language Tags (Informative) . . . . . . . 58
B. Examples of Language Tags (Informative) . . . . . . . . . . . 58 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 61
Intellectual Property and Copyright Statements . . . . . . . . 61 Intellectual Property and Copyright Statements . . . . . . . . . . 62
1. Introduction 1. Introduction
Human beings on our planet have, past and present, used a number of Human beings on our planet have, past and present, used a number of
languages. There are many reasons why one would want to identify the languages. There are many reasons why one would want to identify the
language used when presenting or requesting information. language used when presenting or requesting information.
A user's language preferences often need to be identified so that A user's language preferences often need to be identified so that
appropriate processing can be applied. For example, the user's appropriate processing can be applied. For example, the user's
language preferences in a Web browser can be used to select Web pages language preferences in a Web browser can be used to select Web pages
skipping to change at page 4, line 13 skipping to change at page 4, line 13
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
2. The Language Tag 2. The Language Tag
Language tags are used to help identify languages, whether spoken, Language tags are used to help identify languages, whether spoken,
written, signed, or otherwise signaled, for the purpose of written, signed, or otherwise signaled, for the purpose of
communication. This includes constructed and artificial languages, communication. This includes constructed and artificial languages,
but excludes languages not intended primarily for human but excludes languages not intended primarily for human
communication, such as programming languages. communication, such as programming languages.
2.1 Syntax 2.1. Syntax
The language tag is composed of one or more parts or "subtags". Each The language tag is composed of one or more parts or "subtags". Each
subtag consists of a sequence of alpha-numeric characters. Subtags subtag consists of a sequence of alpha-numeric characters. Subtags
are distinguished and separated from one another by a hyphen ("-", are distinguished and separated from one another by a hyphen ("-",
ABNF [RFC2234bis] %x2D). A language tag consists of a "primary ABNF [RFC4234] %x2D). A language tag consists of a "primary
language" subtag and a (possibly empty) series of subsequent subtags, language" subtag and a (possibly empty) series of subsequent subtags,
each of which refines or narrows the range of language identified by each of which refines or narrows the range of language identified by
the overall tag. the overall tag.
Usually, each type of subtag is distinguished by length, position in Usually, each type of subtag is distinguished by length, position in
the tag, and content: subtags can be recognized solely by these the tag, and content: subtags can be recognized solely by these
features. The only exception to this is a fixed list of features. The only exception to this is a fixed list of
grandfathered tags registered under RFC 3066 [RFC3066]. This makes grandfathered tags registered under RFC 3066 [RFC3066]. This makes
it possible to construct a parser that can extract and assign some it possible to construct a parser that can extract and assign some
semantic information to the subtags, even if the specific subtag semantic information to the subtags, even if the specific subtag
values are not recognized. Thus a parser need not have an up-to-date values are not recognized. Thus a parser need not have an up-to-date
copy (or any copy at all) of the subtag registry to perform most copy (or any copy at all) of the subtag registry to perform most
searching and matching operations. searching and matching operations.
The syntax of the language tag in ABNF [RFC2234bis] is: The syntax of the language tag in ABNF [RFC4234] is:
Language-Tag = langtag Language-Tag = langtag
/ privateuse ; private use tag / privateuse ; private use tag
/ grandfathered ; grandfathered registrations / grandfathered ; grandfathered registrations
langtag = (language langtag = (language
["-" script] ["-" script]
["-" region] ["-" region]
*("-" variant) *("-" variant)
*("-" extension) *("-" extension)
skipping to change at page 6, line 9 skipping to change at page 6, line 9
Figure 1: Language Tag ABNF Figure 1: Language Tag ABNF
Note: There is a subtlety in the ABNF for 'variant': variants Note: There is a subtlety in the ABNF for 'variant': variants
starting with a digit MAY be four characters long, while those starting with a digit MAY be four characters long, while those
starting with a letter MUST be at least five characters long. starting with a letter MUST be at least five characters long.
All subtags have a maximum length of eight characters and whitespace All subtags have a maximum length of eight characters and whitespace
is not permitted in a language tag. For examples of language tags, is not permitted in a language tag. For examples of language tags,
see Appendix B. see Appendix B.
Note that although [RFC2234bis] refers to octets, the language tags Note that although [RFC4234] refers to octets, the language tags
described in this document are sequences of characters from the US- described in this document are sequences of characters from the US-
ASCII [ISO646] repertoire. Language tags MAY be used in documents ASCII [ISO646] repertoire. Language tags MAY be used in documents
and applications that use other encodings, so long as these encompass and applications that use other encodings, so long as these encompass
the US-ASCII repertoire. An example of this would be an XML document the US-ASCII repertoire. An example of this would be an XML document
that uses the UTF-16LE [RFC2781] encoding of [Unicode]. that uses the UTF-16LE [RFC2781] encoding of [Unicode].
The tags and their subtags, including private use and extensions, are The tags and their subtags, including private use and extensions, are
to be treated as case insensitive: there exist conventions for the to be treated as case insensitive: there exist conventions for the
capitalization of some of the subtags, but these MUST NOT be taken to capitalization of some of the subtags, but these MUST NOT be taken to
carry meaning. carry meaning.
skipping to change at page 6, line 47 skipping to change at page 6, line 47
variations conveys the same meaning: Mongolian written in the variations conveys the same meaning: Mongolian written in the
Cyrillic script as used in Mongolia. Cyrillic script as used in Mongolia.
Although case distinctions do not carry meaning in language tags, Although case distinctions do not carry meaning in language tags,
consistent formatting and presentation of the tags will aid users. consistent formatting and presentation of the tags will aid users.
The format of the tags and subtags in the registry is RECOMMENDED. The format of the tags and subtags in the registry is RECOMMENDED.
In this format, all non-initial two-letter subtags are uppercase, all In this format, all non-initial two-letter subtags are uppercase, all
non-initial four-letter subtags are titlecase, and all other subtags non-initial four-letter subtags are titlecase, and all other subtags
are lowercase. are lowercase.
2.2 Language Subtag Sources and Interpretation 2.2. Language Subtag Sources and Interpretation
The namespace of language tags and their subtags is administered by The namespace of language tags and their subtags is administered by
the Internet Assigned Numbers Authority (IANA) [RFC2860] according to the Internet Assigned Numbers Authority (IANA) [RFC2860] according to
the rules in Section 5 of this document. The Language Subtag the rules in Section 5 of this document. The Language Subtag
Registry maintained by IANA is the source for valid subtags: other Registry maintained by IANA is the source for valid subtags: other
standards referenced in this section provide the source material for standards referenced in this section provide the source material for
that registry. that registry.
Terminology in this section: Terminology in this section:
skipping to change at page 8, line 13 skipping to change at page 8, line 13
defined in this document. defined in this document.
o All other single letter subtags are reserved to introduce o All other single letter subtags are reserved to introduce
standardized extension subtag sequences as described in standardized extension subtag sequences as described in
Section 3.7. Section 3.7.
The single letter subtag 'i' is used by some grandfathered tags, such The single letter subtag 'i' is used by some grandfathered tags, such
as "i-enochian", where it always appears in the first position and as "i-enochian", where it always appears in the first position and
cannot be confused with an extension. cannot be confused with an extension.
2.2.1 Primary Language Subtag 2.2.1. Primary Language Subtag
The primary language subtag is the first subtag in a language tag The primary language subtag is the first subtag in a language tag
(with the exception of private use and certain grandfathered tags) (with the exception of private use and certain grandfathered tags)
and cannot be omitted. The following rules apply to the primary and cannot be omitted. The following rules apply to the primary
language subtag: language subtag:
1. All two character language subtags were defined in the IANA 1. All two character language subtags were defined in the IANA
registry according to the assignments found in the standard ISO registry according to the assignments found in the standard ISO
639 Part 1, "ISO 639-1:2002, Codes for the representation of 639 Part 1, "ISO 639-1:2002, Codes for the representation of
names of languages -- Part 1: Alpha-2 code" [ISO639-1], or using names of languages -- Part 1: Alpha-2 code" [ISO639-1], or using
skipping to change at page 10, line 12 skipping to change at page 10, line 12
currently has no two character code, the tag would not be invalidated currently has no two character code, the tag would not be invalidated
if ISO 639-1 were to assign a two character code to the Hawaiian if ISO 639-1 were to assign a two character code to the Hawaiian
language at a later date. language at a later date.
For example, one of the grandfathered IANA registrations is For example, one of the grandfathered IANA registrations is
"i-enochian". The subtag 'enochian' could be registered in the IANA "i-enochian". The subtag 'enochian' could be registered in the IANA
registry as a primary language subtag (assuming that ISO 639 does not registry as a primary language subtag (assuming that ISO 639 does not
register this language first), making tags such as "enochian-AQ" and register this language first), making tags such as "enochian-AQ" and
"enochian-Latn" valid. "enochian-Latn" valid.
2.2.2 Extended Language Subtags 2.2.2. Extended Language Subtags
The following rules apply to the extended language subtags: The following rules apply to the extended language subtags:
1. Three letter subtags immediately following the primary subtag are 1. Three letter subtags immediately following the primary subtag are
reserved for future standardization, anticipating work that is reserved for future standardization, anticipating work that is
currently under way on ISO 639. currently under way on ISO 639.
2. Extended language subtags MUST follow the primary subtag and 2. Extended language subtags MUST follow the primary subtag and
precede any other subtags. precede any other subtags.
skipping to change at page 10, line 40 skipping to change at page 10, line 40
Extended language subtag records, once they appear in the registry, Extended language subtag records, once they appear in the registry,
MUST include exactly one 'Prefix' field indicating an appropriate MUST include exactly one 'Prefix' field indicating an appropriate
language subtag or sequence of subtags that MUST always appear as a language subtag or sequence of subtags that MUST always appear as a
prefix to the extended language subtag. prefix to the extended language subtag.
Example: In a future revision or update of this document, the tag Example: In a future revision or update of this document, the tag
"zh-gan" (registered under RFC 3066) might become a valid non- "zh-gan" (registered under RFC 3066) might become a valid non-
grandfathered (that is, redundant) tag in which the subtag 'gan' grandfathered (that is, redundant) tag in which the subtag 'gan'
might represent the Chinese dialect 'Gan'. might represent the Chinese dialect 'Gan'.
2.2.3 Script Subtag 2.2.3. Script Subtag
Script subtags are used to indicate the script or writing system Script subtags are used to indicate the script or writing system
variations that distinguish the written forms of a language or its variations that distinguish the written forms of a language or its
dialects. The following rules apply to the script subtags: dialects. The following rules apply to the script subtags:
1. All four character subtags were defined according to 1. All four character subtags were defined according to
[ISO15924]--"Codes for the representation of the names of [ISO15924]--"Codes for the representation of the names of
scripts": alpha-4 script codes, or subsequently assigned by the scripts": alpha-4 script codes, or subsequently assigned by the
ISO 15924 maintenance agency or governing standardization bodies, ISO 15924 maintenance agency or governing standardization bodies,
denoting the script or writing system used in conjunction with denoting the script or writing system used in conjunction with
skipping to change at page 11, line 27 skipping to change at page 11, line 27
for registration for that purpose. for registration for that purpose.
5. There MUST be at most one script subtag in a language tag and the 5. There MUST be at most one script subtag in a language tag and the
script subtag SHOULD be omitted when it adds no distinguishing script subtag SHOULD be omitted when it adds no distinguishing
value to the tag or when the primary language subtag's record value to the tag or when the primary language subtag's record
includes a Suppress-Script field listing the applicable script includes a Suppress-Script field listing the applicable script
subtag. subtag.
Example: "sr-Latn" represents Serbian written using the Latin script. Example: "sr-Latn" represents Serbian written using the Latin script.
2.2.4 Region Subtag 2.2.4. Region Subtag
Region subtags are used to indicate linguistic variations associated Region subtags are used to indicate linguistic variations associated
with or appropriate to a specific country, territory, or region. with or appropriate to a specific country, territory, or region.
Typically, a region subtag is used to indicate regional dialects or Typically, a region subtag is used to indicate regional dialects or
usage, or region-specific spelling conventions. A region subtag can usage, or region-specific spelling conventions. A region subtag can
also be used to indicate that content is expressed in a way that is also be used to indicate that content is expressed in a way that is
appropriate for use throughout a region; for instance, Spanish appropriate for use throughout a region; for instance, Spanish
content tailored to be useful throughout Latin America. content tailored to be useful throughout Latin America.
The following rules apply to the region subtags: The following rules apply to the region subtags:
skipping to change at page 13, line 23 skipping to change at page 13, line 23
Section 4.5 for more information on private use subtags. Section 4.5 for more information on private use subtags.
"de-CH" represents German ('de') as used in Switzerland ('CH'). "de-CH" represents German ('de') as used in Switzerland ('CH').
"sr-Latn-CS" represents Serbian ('sr') written using Latin script "sr-Latn-CS" represents Serbian ('sr') written using Latin script
('Latn') as used in Serbia and Montenegro ('CS'). ('Latn') as used in Serbia and Montenegro ('CS').
"es-419" represents Spanish ('es') appropriate to the UN-defined "es-419" represents Spanish ('es') appropriate to the UN-defined
Latin America and Caribbean region ('419'). Latin America and Caribbean region ('419').
2.2.5 Variant Subtags 2.2.5. Variant Subtags
Variant subtags are used to indicate additional, well-recognized Variant subtags are used to indicate additional, well-recognized
variations that define a language or its dialects which are not variations that define a language or its dialects which are not
covered by other available subtags. The following rules apply to the covered by other available subtags. The following rules apply to the
variant subtags: variant subtags:
1. Variant subtags are not associated with any external standard. 1. Variant subtags are not associated with any external standard.
Variant subtags and their meanings are defined by the Variant subtags and their meanings are defined by the
registration process defined in Section 3.5. registration process defined in Section 3.5.
skipping to change at page 14, line 26 skipping to change at page 14, line 26
Most variants that share a prefix are mutually exclusive. For Most variants that share a prefix are mutually exclusive. For
example, the German orthographic variations '1996' and '1901' SHOULD example, the German orthographic variations '1996' and '1901' SHOULD
NOT be used in the same tag, as they represent the dates of different NOT be used in the same tag, as they represent the dates of different
spelling reforms. A variant that can meaningfully be used in spelling reforms. A variant that can meaningfully be used in
combination with another variant SHOULD include a 'Prefix' field in combination with another variant SHOULD include a 'Prefix' field in
its registry record that lists that other variant. For example, if its registry record that lists that other variant. For example, if
another German variant 'example' were created that made sense to use another German variant 'example' were created that made sense to use
with '1996', then 'example' should include two Prefix fields: "de" with '1996', then 'example' should include two Prefix fields: "de"
and "de-1996". and "de-1996".
2.2.6 Extension Subtags 2.2.6. Extension Subtags
Extensions provide a mechanism for extending language tags for use in Extensions provide a mechanism for extending language tags for use in
various applications. See: Section 3.7. The following rules apply various applications. See: Section 3.7. The following rules apply
to extensions: to extensions:
1. Extension subtags are separated from the other subtags defined 1. Extension subtags are separated from the other subtags defined
in this document by a single character subtag ("singleton"). in this document by a single character subtag ("singleton").
The singleton MUST be one allocated to a registration authority The singleton MUST be one allocated to a registration authority
via the mechanism described in Section 3.7 and MUST NOT be the via the mechanism described in Section 3.7 and MUST NOT be the
letter 'x', which is reserved for private use subtag sequences. letter 'x', which is reserved for private use subtag sequences.
skipping to change at page 15, line 39 skipping to change at page 15, line 39
defined by the extension 'a'. defined by the extension 'a'.
11. In the event that more than one extension appears in a single 11. In the event that more than one extension appears in a single
tag, the tag SHOULD be canonicalized as described in tag, the tag SHOULD be canonicalized as described in
Section 4.4. Section 4.4.
For example, if the prefix singleton 'r' and the shown subtags were For example, if the prefix singleton 'r' and the shown subtags were
defined, then the following tag would be a valid example: "en-Latn- defined, then the following tag would be a valid example: "en-Latn-
GB-boont-r-extended-sequence-x-private" GB-boont-r-extended-sequence-x-private"
2.2.7 Private Use Subtags 2.2.7. Private Use Subtags
Private use subtags are used to indicate distinctions in language Private use subtags are used to indicate distinctions in language
important in a given context by private agreement. The following important in a given context by private agreement. The following
rules apply to private use subtags: rules apply to private use subtags:
1. Private use subtags are separated from the other subtags defined 1. Private use subtags are separated from the other subtags defined
in this document by the reserved single-character subtag 'x'. in this document by the reserved single-character subtag 'x'.
2. Private use subtags MUST conform to the format and content 2. Private use subtags MUST conform to the format and content
constraints defined in the ABNF for all subtags. constraints defined in the ABNF for all subtags.
skipping to change at page 16, line 26 skipping to change at page 16, line 26
6. Private use subtags are NOT RECOMMENDED where alternatives exist 6. Private use subtags are NOT RECOMMENDED where alternatives exist
or for general interchange. See Section 4.5 for more information or for general interchange. See Section 4.5 for more information
on private use subtag choice. on private use subtag choice.
For example: Users who wished to utilize codes from the Ethnologue For example: Users who wished to utilize codes from the Ethnologue
publication of SIL International for language identification might publication of SIL International for language identification might
agree to exchange tags such as "az-Arab-x-AZE-derbend". This example agree to exchange tags such as "az-Arab-x-AZE-derbend". This example
contains two private use subtags. The first is 'AZE' and the second contains two private use subtags. The first is 'AZE' and the second
is 'derbend'. is 'derbend'.
2.2.8 Pre-Existing RFC 3066 Registrations 2.2.8. Pre-Existing RFC 3066 Registrations
Existing IANA-registered language tags from RFC 1766 and/or RFC 3066 Existing IANA-registered language tags from RFC 1766 and/or RFC 3066
maintain their validity. These tags will be maintained in the maintain their validity. These tags will be maintained in the
registry in records of either the "grandfathered" or "redundant" registry in records of either the "grandfathered" or "redundant"
type. Grandfathered tags contain one or more subtags that are not type. Grandfathered tags contain one or more subtags that are not
defined in the Language Subtag Registry (see Section 3). Redundant defined in the Language Subtag Registry (see Section 3). Redundant
tags consist entirely of subtags defined above and whose independent tags consist entirely of subtags defined above and whose independent
registration is superseded by this document. For more information registration is superseded by this document. For more information
see Section 3.8. see Section 3.8.
It is important to note that all language tags formed under the It is important to note that all language tags formed under the
guidelines in this document were either legal, well-formed tags or guidelines in this document were either legal, well-formed tags or
could have been registered under RFC 3066. could have been registered under RFC 3066.
2.2.9 Classes of Conformance 2.2.9. Classes of Conformance
Implementations sometimes need to describe their capabilities with Implementations sometimes need to describe their capabilities with
regard to the rules and practices described in this document. There regard to the rules and practices described in this document. There
are two classes of conforming implementations described by this are two classes of conforming implementations described by this
document: "well-formed" processors and "validating" processors. document: "well-formed" processors and "validating" processors.
Claims of conformance SHOULD explicitly reference one of these Claims of conformance SHOULD explicitly reference one of these
definitions. definitions.
An implementation that claims to check for well-formed language tags An implementation that claims to check for well-formed language tags
MUST: MUST:
skipping to change at page 18, line 21 skipping to change at page 18, line 21
The Language Subtag Registry contains a comprehensive list of all of The Language Subtag Registry contains a comprehensive list of all of
the subtags valid in language tags. This allows implementers a the subtags valid in language tags. This allows implementers a
straightforward and reliable way to validate language tags. The straightforward and reliable way to validate language tags. The
Language Subtag Registry will be maintained so that, except for Language Subtag Registry will be maintained so that, except for
extension subtags, it is possible to validate all of the subtags that extension subtags, it is possible to validate all of the subtags that
appear in a language tag under the provisions of this document or its appear in a language tag under the provisions of this document or its
revisions or successors. In addition, the meaning of the various revisions or successors. In addition, the meaning of the various
subtags will be unambiguous and stable over time. (The meaning of subtags will be unambiguous and stable over time. (The meaning of
private use subtags, of course, is not defined by the IANA registry.) private use subtags, of course, is not defined by the IANA registry.)
3.1 Format of the IANA Language Subtag Registry 3.1. Format of the IANA Language Subtag Registry
The IANA Language Subtag Registry ("the registry") consists of a text The IANA Language Subtag Registry ("the registry") consists of a text
file that is machine readable in the format described in this file that is machine readable in the format described in this
section, plus copies of the registration forms approved in accordance section, plus copies of the registration forms approved in accordance
with the process described in Section 3.5. The existing registration with the process described in Section 3.5. The existing registration
forms for grandfathered and redundant tags taken from RFC 3066 will forms for grandfathered and redundant tags taken from RFC 3066 will
be maintained as part of the obsolete RFC 3066 registry. The be maintained as part of the obsolete RFC 3066 registry. The
remaining set of initial subtags will not have registration forms remaining set of initial subtags will not have registration forms
created for them. created for them.
skipping to change at page 18, line 44 skipping to change at page 18, line 44
Each line of text is limited to 72 characters, including all Each line of text is limited to 72 characters, including all
whitespace. Records are separated by lines containing only the whitespace. Records are separated by lines containing only the
sequence "%%" (%x25.25). sequence "%%" (%x25.25).
Each field can be viewed as a single, logical line of ASCII Each field can be viewed as a single, logical line of ASCII
characters, comprising a field-name and a field-body separated by a characters, comprising a field-name and a field-body separated by a
COLON character (%x3A). For convenience, the field-body portion of COLON character (%x3A). For convenience, the field-body portion of
this conceptual entity can be split into a multiple-line this conceptual entity can be split into a multiple-line
representation; this is called "folding". The format of the registry representation; this is called "folding". The format of the registry
is described by the following ABNF (per [RFC2234bis]): is described by the following ABNF (per [RFC4234]):
registry = record *("%%" CRLF record) registry = record *("%%" CRLF record)
record = 1*( field-name *SP ":" *SP field-body CRLF ) record = 1*( field-name *SP ":" *SP field-body CRLF )
field-name = (ALPHA / DIGIT)[*(ALPHA / DIGIT / "-") (ALPHA / DIGIT)] field-name = (ALPHA / DIGIT)[*(ALPHA / DIGIT / "-") (ALPHA / DIGIT)]
field-body = *(ASCCHAR/LWSP) field-body = *(ASCCHAR/LWSP)
ASCCHAR = %x21-25 / %x27-7E / UNICHAR ; Note: AMPERSAND is %x26 ASCCHAR = %x21-25 / %x27-7E / UNICHAR ; Note: AMPERSAND is %x26
UNICHAR = "&#x" 2*6HEXDIG ";" UNICHAR = "&#x" 2*6HEXDIG ";"
Figure 2: record-jar ABNF Figure 2: registry format ABNF
The sequence '..' (%x2E.2E) in a field-body denotes a range of The sequence '..' (%x2E.2E) in a field-body denotes a range of
values. Such a range represents all subtags of the same length that values. Such a range represents all subtags of the same length that
are in alphabetic or numeric order within that range, including the are in alphabetic or numeric order within that range, including the
values explicitly mentioned. For example 'a..c' denotes the values values explicitly mentioned. For example 'a..c' denotes the values
'a', 'b', and 'c' and '11..13' denotes the values '11', '12', and 'a', 'b', and 'c' and '11..13' denotes the values '11', '12', and
'13'. '13'.
Characters from outside the US-ASCII[ISO646] repertoire, as well as Characters from outside the US-ASCII[ISO646] repertoire, as well as
the AMPERSAND character ("&", %x26) when it occurs in a field-body the AMPERSAND character ("&", %x26) when it occurs in a field-body
skipping to change at page 23, line 42 skipping to change at page 23, line 42
field-value is 'language'. This field MUST NOT appear more than one field-value is 'language'. This field MUST NOT appear more than one
time in a record. This field indicates a script used to write the time in a record. This field indicates a script used to write the
overwhelming majority of documents for the given language and which overwhelming majority of documents for the given language and which
therefore adds no distinguishing information to a language tag. It therefore adds no distinguishing information to a language tag. It
helps ensure greater compatibility between the language tags helps ensure greater compatibility between the language tags
generated according to the rules in this document and language tags generated according to the rules in this document and language tags
and tag processors or consumers based on RFC 3066. For example, and tag processors or consumers based on RFC 3066. For example,
virtually all Icelandic documents are written in the Latin script, virtually all Icelandic documents are written in the Latin script,
making the subtag 'Latn' redundant in the tag "is-Latn". making the subtag 'Latn' redundant in the tag "is-Latn".
3.2 Language Subtag Reviewer 3.2. Language Subtag Reviewer
The Language Subtag Reviewer is appointed by the IESG for an The Language Subtag Reviewer is appointed by the IESG for an
indefinite term, subject to removal or replacement at the IESG's indefinite term, subject to removal or replacement at the IESG's
discretion. The Language Subtag Reviewer moderates the ietf- discretion. The Language Subtag Reviewer moderates the ietf-
languages mailing list, responds to requests for registration, and languages mailing list, responds to requests for registration, and
performs the other registry maintenance duties described in performs the other registry maintenance duties described in
Section 3.3. Only the Language Subtag Reviewer is permitted to Section 3.3. Only the Language Subtag Reviewer is permitted to
request IANA to change, update or add records to the Language Subtag request IANA to change, update or add records to the Language Subtag
Registry. Registry.
The performance or decisions of the Language Subtag Reviewer MAY be The performance or decisions of the Language Subtag Reviewer MAY be
appealed to the IESG under the same rules as other IETF decisions appealed to the IESG under the same rules as other IETF decisions
(see [RFC2026]). The IESG can reverse or overturn the decision of (see [RFC2026]). The IESG can reverse or overturn the decision of
the Language Subtag Reviewer, provide guidance, or take other the Language Subtag Reviewer, provide guidance, or take other
appropriate actions. appropriate actions.
3.3 Maintenance of the Registry 3.3. Maintenance of the Registry
Maintenance of the registry requires that as codes are assigned or Maintenance of the registry requires that as codes are assigned or
withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language
Subtag Reviewer MUST evaluate each change, determine whether it Subtag Reviewer MUST evaluate each change, determine whether it
conflicts with existing registry entries, and submit the information conflicts with existing registry entries, and submit the information
to IANA for inclusion in the registry. If a change takes place and to IANA for inclusion in the registry. If a change takes place and
the Language Subtag Reviewer does not do this in a timely manner, the Language Subtag Reviewer does not do this in a timely manner,
then any interested party MAY use the procedure in Section 3.5 to then any interested party MAY use the procedure in Section 3.5 to
register the appropriate update. register the appropriate update.
skipping to change at page 25, line 33 skipping to change at page 25, line 33
Figure 4: Example of a Language Subtag Modification Form Figure 4: Example of a Language Subtag Modification Form
Whenever an entry is created or modified in the registry, the 'File- Whenever an entry is created or modified in the registry, the 'File-
Date' record at the start of the registry is updated to reflect the Date' record at the start of the registry is updated to reflect the
most recent modification date in the [RFC3339] "full-date" format. most recent modification date in the [RFC3339] "full-date" format.
Before forwarding a new registration to IANA, the Language Subtag Before forwarding a new registration to IANA, the Language Subtag
Reviewer MUST ensure that values in the 'Subtag' field match case Reviewer MUST ensure that values in the 'Subtag' field match case
according to the description in Section 3.1. according to the description in Section 3.1.
3.4 Stability of IANA Registry Entries 3.4. Stability of IANA Registry Entries
The stability of entries and their meaning in the registry is The stability of entries and their meaning in the registry is
critical to the long term stability of language tags. The rules in critical to the long term stability of language tags. The rules in
this section guarantee that a specific language tag's meaning is this section guarantee that a specific language tag's meaning is
stable over time and will not change. stable over time and will not change.
These rules specifically deal with how changes to codes (including These rules specifically deal with how changes to codes (including
withdrawal and deprecation of codes) maintained by ISO 639, ISO withdrawal and deprecation of codes) maintained by ISO 639, ISO
15924, ISO 3166, and UN M.49 are reflected in the IANA Language 15924, ISO 3166, and UN M.49 are reflected in the IANA Language
Subtag Registry. Assignments to the IANA Language Subtag Registry Subtag Registry. Assignments to the IANA Language Subtag Registry
skipping to change at page 28, line 47 skipping to change at page 28, line 47
become valid subtags in the IANA registry, then the field 'Type' become valid subtags in the IANA registry, then the field 'Type'
in that record is changed from 'grandfathered' to 'redundant'. in that record is changed from 'grandfathered' to 'redundant'.
Note that this will not affect language tags that match the Note that this will not affect language tags that match the
grandfathered tag, since these tags will now match valid grandfathered tag, since these tags will now match valid
generative subtag sequences. For example, if the subtag 'gan' generative subtag sequences. For example, if the subtag 'gan'
in the language tag "zh-gan" were to be registered as an in the language tag "zh-gan" were to be registered as an
extended language subtag, then the grandfathered tag "zh-gan" extended language subtag, then the grandfathered tag "zh-gan"
would be deprecated (but existing content or implementations would be deprecated (but existing content or implementations
that use "zh-gan" would remain valid). that use "zh-gan" would remain valid).
3.5 Registration Procedure for Subtags 3.5. Registration Procedure for Subtags
The procedure given here MUST be used by anyone who wants to use a The procedure given here MUST be used by anyone who wants to use a
subtag not currently in the IANA Language Subtag Registry. subtag not currently in the IANA Language Subtag Registry.
Only subtags of type 'language' and 'variant' will be considered for Only subtags of type 'language' and 'variant' will be considered for
independent registration of new subtags. Handling of subtags needed independent registration of new subtags. Handling of subtags needed
for stability and subtags necessary to keep the registry synchronized for stability and subtags necessary to keep the registry synchronized
with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits
defined by this document are described in Section 3.3. Stability defined by this document are described in Section 3.3. Stability
provisions are described in Section 3.4. provisions are described in Section 3.4.
skipping to change at page 31, line 44 skipping to change at page 31, line 44
refers to. In most cases, reference to an authoritative grammar or refers to. In most cases, reference to an authoritative grammar or
dictionary of that language will be useful; in cases where no such dictionary of that language will be useful; in cases where no such
work exists, other well known works describing that language or in work exists, other well known works describing that language or in
that language MAY be appropriate. The Language Subtag Reviewer that language MAY be appropriate. The Language Subtag Reviewer
decides what constitutes "good enough" reference material. This decides what constitutes "good enough" reference material. This
requirement is not intended to exclude particular languages or requirement is not intended to exclude particular languages or
dialects due to the size of the speaker population or lack of a dialects due to the size of the speaker population or lack of a
standardized orthography. Minority languages will be considered standardized orthography. Minority languages will be considered
equally on their own merits. equally on their own merits.
3.6 Possibilities for Registration 3.6. Possibilities for Registration
Possibilities for registration of subtags or information about Possibilities for registration of subtags or information about
subtags include: subtags include:
o Primary language subtags for languages not listed in ISO 639 that o Primary language subtags for languages not listed in ISO 639 that
are not variants of any listed or registered language MAY be are not variants of any listed or registered language MAY be
registered. At the time this document was created there were no registered. At the time this document was created there were no
examples of this form of subtag. Before attempting to register a examples of this form of subtag. Before attempting to register a
language subtag, there MUST be an attempt to register the language language subtag, there MUST be an attempt to register the language
with ISO 639. Subtags MUST NOT be registered for codes that exist with ISO 639. Subtags MUST NOT be registered for codes that exist
skipping to change at page 33, line 43 skipping to change at page 33, line 43
Statistical Services Branch Statistical Services Branch
Statistics Division Statistics Division
United Nations, Room DC2-1620 United Nations, Room DC2-1620
New York, NY 10017, USA New York, NY 10017, USA
Fax: +1-212-963-0623 Fax: +1-212-963-0623
E-mail: statistics@un.org E-mail: statistics@un.org
URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm
3.7 Extensions and Extensions Registry 3.7. Extensions and Extensions Registry
Extension subtags are those introduced by single character subtags Extension subtags are those introduced by single character subtags
("singletons") other than 'x'. They are reserved for the generation ("singletons") other than 'x'. They are reserved for the generation
of identifiers which contain a language component, and are compatible of identifiers which contain a language component, and are compatible
with applications that understand language tags. with applications that understand language tags.
The structure and form of extensions are defined by this document so The structure and form of extensions are defined by this document so
that implementations can be created that are forward compatible with that implementations can be created that are forward compatible with
applications that might be created using singletons in the future. applications that might be created using singletons in the future.
In addition, defining a mechanism for maintaining singletons will In addition, defining a mechanism for maintaining singletons will
skipping to change at page 36, line 36 skipping to change at page 36, line 36
that the most significant information be in the most significant that the most significant information be in the most significant
(left-most) subtags, and that the specification gracefully handle (left-most) subtags, and that the specification gracefully handle
truncated subtags. truncated subtags.
When a language tag is to be used in a specific, known, protocol, it When a language tag is to be used in a specific, known, protocol, it
is RECOMMENDED that that the language tag not contain extensions not is RECOMMENDED that that the language tag not contain extensions not
supported by that protocol. In addition, note that some protocols supported by that protocol. In addition, note that some protocols
MAY impose upper limits on the length of the strings used to store or MAY impose upper limits on the length of the strings used to store or
transport the language tag. transport the language tag.
3.8 Initialization of the Registries 3.8. Initialization of the Registries
Upon adoption of this document an initial version of the Language Upon adoption of this document an initial version of the Language
Subtag Registry containing the various subtags initially valid in a Subtag Registry containing the various subtags initially valid in a
language tag is necessary. This collection of subtags, along with a language tag is necessary. This collection of subtags, along with a
description of the process used to create it, is described by description of the process used to create it, is described by
[initial-registry]. IANA SHALL publish the initial version of the [initial-registry]. IANA SHALL publish the initial version of the
registry described by this document from the content of [initial- registry described by this document from the content of [initial-
registry]. Once published by IANA, the maintenance procedures, rules registry]. Once published by IANA, the maintenance procedures, rules
and registration processes described in this document will be and registration processes described in this document will be
available for new registrations or updates. available for new registrations or updates.
skipping to change at page 38, line 10 skipping to change at page 38, line 10
An initial version of the Language Extension Registry described in An initial version of the Language Extension Registry described in
Section 3.7 is also needed. The Language Extension Registry SHALL be Section 3.7 is also needed. The Language Extension Registry SHALL be
initialized with a single record containing a single field of type initialized with a single record containing a single field of type
"File-Date" as a placeholder for future assignments. "File-Date" as a placeholder for future assignments.
4. Formation and Processing of Language Tags 4. Formation and Processing of Language Tags
This section addresses how to use the information in the registry This section addresses how to use the information in the registry
with the tag syntax to choose, form and process language tags. with the tag syntax to choose, form and process language tags.
4.1 Choice of Language Tag 4.1. Choice of Language Tag
One is sometimes faced with the choice between several possible tags One is sometimes faced with the choice between several possible tags
for the same body of text. for the same body of text.
Interoperability is best served when all users use the same language Interoperability is best served when all users use the same language
tag in order to represent the same language. If an application has tag in order to represent the same language. If an application has
requirements that make the rules here inapplicable, then that requirements that make the rules here inapplicable, then that
application risks damaging interoperability. It is strongly application risks damaging interoperability. It is strongly
RECOMMENDED that users not define their own rules for language tag RECOMMENDED that users not define their own rules for language tag
choice. choice.
skipping to change at page 40, line 21 skipping to change at page 40, line 21
a language tag. a language tag.
* For example, do not use "de-DE-1901-1901". * For example, do not use "de-DE-1901-1901".
To ensure consistent backward compatibility, this document contains To ensure consistent backward compatibility, this document contains
several provisions to account for potential instability in the several provisions to account for potential instability in the
standards used to define the subtags that make up language tags. standards used to define the subtags that make up language tags.
These provisions mean that no language tag created under the rules in These provisions mean that no language tag created under the rules in
this document will become obsolete. this document will become obsolete.
4.2 Meaning of the Language Tag 4.2. Meaning of the Language Tag
The relationship between the tag and the information it relates to is The relationship between the tag and the information it relates to is
defined by the context in which the tag appears. Accordingly, this defined by the context in which the tag appears. Accordingly, this
section gives only possible examples of its usage. section gives only possible examples of its usage.
o For a single information object, the associated language tags o For a single information object, the associated language tags
might be interpreted as the set of languages that is necessary for might be interpreted as the set of languages that is necessary for
a complete comprehension of the complete object. Example: Plain a complete comprehension of the complete object. Example: Plain
text documents. text documents.
skipping to change at page 41, line 25 skipping to change at page 41, line 25
languages that begin with the same sequence of subtags are NOT languages that begin with the same sequence of subtags are NOT
guaranteed to be mutually intelligible, although they might be. For guaranteed to be mutually intelligible, although they might be. For
example, the tag "az" shares a prefix with both "az-Latn" example, the tag "az" shares a prefix with both "az-Latn"
(Azerbaijani written using the Latin script) and "az-Cyrl" (Azerbaijani written using the Latin script) and "az-Cyrl"
(Azerbaijani written using the Cyrillic script). A person fluent in (Azerbaijani written using the Cyrillic script). A person fluent in
one script might not be able to read the other, even though the text one script might not be able to read the other, even though the text
might be identical. Content tagged as "az" most probably is written might be identical. Content tagged as "az" most probably is written
in just one script and thus might not be intelligible to a reader in just one script and thus might not be intelligible to a reader
familiar with the other script. familiar with the other script.
4.3 Length Considerations 4.3. Length Considerations
[RFC3066] did not provide an upper limit on the size of language [RFC3066] did not provide an upper limit on the size of language
tags. While RFC 3066 did define the semantics of particular subtags tags. While RFC 3066 did define the semantics of particular subtags
in such a way that most language tags consisted of language and in such a way that most language tags consisted of language and
region subtags with a combined total length of up to six characters, region subtags with a combined total length of up to six characters,
larger registered tags were not only possible but were actually larger registered tags were not only possible but were actually
registered. registered.
Neither the language tag syntax nor other requirements in this Neither the language tag syntax nor other requirements in this
document impose a fixed upper limit on the number of subtags in a document impose a fixed upper limit on the number of subtags in a
language tag (and thus an upper bound on the size of a tag). The language tag (and thus an upper bound on the size of a tag). The
language tag syntax suggests that, depending on the specific language tag syntax suggests that, depending on the specific
language, more subtags (and thus a longer tag) are sometimes language, more subtags (and thus a longer tag) are sometimes
necessary to completely identify the language for certain necessary to completely identify the language for certain
applications; thus it is possible to envision long or complex subtag applications; thus it is possible to envision long or complex subtag
sequences. sequences.
4.3.1 Working with Limited Buffer Sizes 4.3.1. Working with Limited Buffer Sizes
Some applications and protocols are forced to allocate fixed buffer Some applications and protocols are forced to allocate fixed buffer
sizes or otherwise limit the length of a language tag. A conformant sizes or otherwise limit the length of a language tag. A conformant
implementation or specification MAY refuse to support the storage of implementation or specification MAY refuse to support the storage of
language tags which exceed a specified length. Any such limitation language tags which exceed a specified length. Any such limitation
SHOULD be clearly documented, and such documentation SHOULD include SHOULD be clearly documented, and such documentation SHOULD include
what happens to longer tags (for example, whether an error value is what happens to longer tags (for example, whether an error value is
generated or the language tag is truncated). A protocol that allows generated or the language tag is truncated). A protocol that allows
tags to be truncated at an arbitrary limit, without giving any tags to be truncated at an arbitrary limit, without giving any
indication of what that limit is, has the potential for causing harm indication of what that limit is, has the potential for causing harm
skipping to change at page 43, line 18 skipping to change at page 43, line 18
extlang3 = 4 (extremely unlikely) extlang3 = 4 (extremely unlikely)
script = 5 (if not suppressed: see Section 4.1) script = 5 (if not suppressed: see Section 4.1)
region = 4 (UN M.49; ISO 3166 requires 3) region = 4 (UN M.49; ISO 3166 requires 3)
variant1 = 9 (MUST have language as a prefix) variant1 = 9 (MUST have language as a prefix)
variant2 = 9 (MUST have language-variant1 as a prefix) variant2 = 9 (MUST have language-variant1 as a prefix)
total = 42 characters total = 42 characters
Figure 7: Derivation of the Limit on Tag Length Figure 7: Derivation of the Limit on Tag Length
4.3.2 Truncation of Language Tags 4.3.2. Truncation of Language Tags
Truncation of a language tag alters the meaning of the tag, and thus Truncation of a language tag alters the meaning of the tag, and thus
SHOULD be avoided. However, truncation of language tags is sometimes SHOULD be avoided. However, truncation of language tags is sometimes
necessary due to limited buffer sizes. Such truncation MUST NOT necessary due to limited buffer sizes. Such truncation MUST NOT
permit a subtag to be chopped off in the middle or the formation of permit a subtag to be chopped off in the middle or the formation of
invalid tags (for example, one ending with the "-" character). invalid tags (for example, one ending with the "-" character).
This means that applications or protocols which truncate tags MUST do This means that applications or protocols which truncate tags MUST do
so by progressively removing subtags along with their preceding "-" so by progressively removing subtags along with their preceding "-"
from the right side of the language tag until the tag is short enough from the right side of the language tag until the tag is short enough
skipping to change at page 43, line 43 skipping to change at page 43, line 43
Tag to truncate: zh-Hant-CN-variant1-a-extend1-x-wadegile-private1 Tag to truncate: zh-Hant-CN-variant1-a-extend1-x-wadegile-private1
1. zh-Latn-CN-variant1-a-extend1-x-wadegile 1. zh-Latn-CN-variant1-a-extend1-x-wadegile
2. zh-Latn-CN-variant1-a-extend1 2. zh-Latn-CN-variant1-a-extend1
3. zh-Latn-CN-variant1 3. zh-Latn-CN-variant1
4. zh-Latn-CN 4. zh-Latn-CN
5. zh-Latn 5. zh-Latn
6. zh 6. zh
Figure 8: Example of Tag Truncation Figure 8: Example of Tag Truncation
4.4 Canonicalization of Language Tags 4.4. Canonicalization of Language Tags
Since a particular language tag is sometimes used by many processes, Since a particular language tag is sometimes used by many processes,
language tags SHOULD always be created or generated in a canonical language tags SHOULD always be created or generated in a canonical
form. form.
A language tag is in canonical form when: A language tag is in canonical form when:
1. The tag is well-formed according the rules in Section 2.1 and 1. The tag is well-formed according the rules in Section 2.1 and
Section 2.2. Section 2.2.
skipping to change at page 45, line 36 skipping to change at page 45, line 32
define how the order of the extension's subtags are interpreted. For define how the order of the extension's subtags are interpreted. For
example, an extension could define that its subtags are in canonical example, an extension could define that its subtags are in canonical
order when the subtags are placed into ASCII order: that is, "en-a- order when the subtags are placed into ASCII order: that is, "en-a-
aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension might aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension might
define that the order of the subtags influences their semantic define that the order of the subtags influences their semantic
meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b- meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b-
aaa-bbb-ccc"). However, extension specifications SHOULD be designed aaa-bbb-ccc"). However, extension specifications SHOULD be designed
so that they are tolerant of the typical processes described in so that they are tolerant of the typical processes described in
Section 3.7. Section 3.7.
4.5 Considerations for Private Use Subtags 4.5. Considerations for Private Use Subtags
Private use subtags, like all other subtags, MUST conform to the Private use subtags, like all other subtags, MUST conform to the
format and content constraints in the ABNF. Private use subtags have format and content constraints in the ABNF. Private use subtags have
no meaning outside the private agreement between the parties that no meaning outside the private agreement between the parties that
intend to use or exchange language tags that employ them. The same intend to use or exchange language tags that employ them. The same
subtags MAY be used with a different meaning under a separate private subtags MAY be used with a different meaning under a separate private
agreement. They SHOULD NOT be used where alternatives exist and agreement. They SHOULD NOT be used where alternatives exist and
SHOULD NOT be used in content or protocols intended for general use. SHOULD NOT be used in content or protocols intended for general use.
Private use subtags are simply useless for information exchange Private use subtags are simply useless for information exchange
skipping to change at page 47, line 16 skipping to change at page 47, line 16
This section deals with the processes and requirements necessary for This section deals with the processes and requirements necessary for
IANA to undertake to maintain the subtag and extension registries as IANA to undertake to maintain the subtag and extension registries as
defined by this document and in accordance with the requirements of defined by this document and in accordance with the requirements of
[RFC2434]. [RFC2434].
The impact on the IANA maintainers of the two registries defined by The impact on the IANA maintainers of the two registries defined by
this document will be a small increase in the frequency of new this document will be a small increase in the frequency of new
entries or updates. entries or updates.
5.1 Language Subtag Registry 5.1. Language Subtag Registry
Upon adoption of this document, the registry will be initialized by a Upon adoption of this document, the registry will be initialized by a
companion document: [initial-registry]. The criteria and process for companion document: [initial-registry]. The criteria and process for
selecting the initial set of records is described in that document. selecting the initial set of records is described in that document.
The initial set of records represents no impact on IANA, since the The initial set of records represents no impact on IANA, since the
work to create it will be performed externally. work to create it will be performed externally.
The new registry MUST be listed under "Language Tags" at The new registry MUST be listed under "Language Tags" at
<http://www.iana.org/numbers.html>, replacing the existing <http://www.iana.org/numbers.html>, replacing the existing
registrations defined by [RFC3066]. The existing set of registration registrations defined by [RFC3066]. The existing set of registration
skipping to change at page 47, line 50 skipping to change at page 47, line 50
IANA MUST place any inserted or modified records into the appropriate IANA MUST place any inserted or modified records into the appropriate
section of the language subtag registry, grouping the records by section of the language subtag registry, grouping the records by
their 'Type' field. Inserted records MAY be placed anywhere in the their 'Type' field. Inserted records MAY be placed anywhere in the
appropriate section; there is no guarantee of the order of the appropriate section; there is no guarantee of the order of the
records beyond grouping them together by 'Type'. Modified records records beyond grouping them together by 'Type'. Modified records
MUST overwrite the record they replace. MUST overwrite the record they replace.
Included in any request to insert or modify records MUST be a new Included in any request to insert or modify records MUST be a new
File-Date record. This record MUST be placed first in the registry. File-Date record. This record MUST be placed first in the registry.
In the event that the File-Date record present in the registry has a In the event that the File-Date record present in the registry has a
later date then the record being inserted or modified, the existing later date than the record being inserted or modified, the existing
record MUST be preserved. record MUST be preserved.
5.2 Extensions Registry 5.2. Extensions Registry
The Language Tag Extensions registry will also be generated and sent The Language Tag Extensions registry will also be generated and sent
to IANA as described in Section 3.7. This registry can contain at to IANA as described in Section 3.7. This registry can contain at
most 35 records and thus changes to this registry are expected to be most 35 records and thus changes to this registry are expected to be
very infrequent. very infrequent.
Future work by IANA on the Language Tag Extensions Registry is Future work by IANA on the Language Tag Extensions Registry is
limited to two cases. First, the IESG MAY request that new records limited to two cases. First, the IESG MAY request that new records
be inserted into this registry from time to time. These requests be inserted into this registry from time to time. These requests
MUST include the record to insert in the exact format described in MUST include the record to insert in the exact format described in
skipping to change at page 54, line 7 skipping to change at page 54, line 7
as the mechanism for creating private use language, script, and as the mechanism for creating private use language, script, and
region subtags respectively. region subtags respectively.
o Adds a well-defined extension mechanism. o Adds a well-defined extension mechanism.
o Defines an extended language subtag, possibly for use with certain o Defines an extended language subtag, possibly for use with certain
anticipated features of ISO 639-3. anticipated features of ISO 639-3.
9. References 9. References
9.1 Normative References 9.1. Normative References
[ISO10646] [ISO10646]
International Organization for Standardization, "ISO/IEC International Organization for Standardization, "ISO/IEC
10646:2003. Information technology -- Universal Multiple- 10646:2003. Information technology -- Universal Multiple-
Octet Coded Character Set (UCS)", 2003. Octet Coded Character Set (UCS)", 2003.
[ISO15924] [ISO15924]
International Organization for Standardization, "ISO International Organization for Standardization, "ISO
15924:2004. Information and documentation -- Codes for the 15924:2004. Information and documentation -- Codes for the
representation of names of scripts", January 2004. representation of names of scripts", January 2004.
skipping to change at page 54, line 48 skipping to change at page 54, line 48
[RFC2026] Bradner, S., "The Internet Standards Process -- Revision [RFC2026] Bradner, S., "The Internet Standards Process -- Revision
3", BCP 9, RFC 2026, October 1996. 3", BCP 9, RFC 2026, October 1996.
[RFC2028] Hovey, R. and S. Bradner, "The Organizations Involved in [RFC2028] Hovey, R. and S. Bradner, "The Organizations Involved in
the IETF Standards Process", BCP 11, RFC 2028, the IETF Standards Process", BCP 11, RFC 2028,
October 1996. October 1996.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2234bis]
Crocker, D. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", draft-crocker-abnf-rfc2234bis-00
(work in progress), March 2005.
[RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an
IANA Considerations Section in RFCs", BCP 26, RFC 2434, IANA Considerations Section in RFCs", BCP 26, RFC 2434,
October 1998. October 1998.
[RFC2860] Carpenter, B., Baker, F., and M. Roberts, "Memorandum of [RFC2860] Carpenter, B., Baker, F., and M. Roberts, "Memorandum of
Understanding Concerning the Technical Work of the Understanding Concerning the Technical Work of the
Internet Assigned Numbers Authority", RFC 2860, June 2000. Internet Assigned Numbers Authority", RFC 2860, June 2000.
[RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: [RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet:
Timestamps", RFC 3339, July 2002. Timestamps", RFC 3339, July 2002.
[RFC4234] Crocker, D. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", RFC 4234, October 2005.
[UN_M.49] Statistics Division, United Nations, "Standard Country or [UN_M.49] Statistics Division, United Nations, "Standard Country or
Area Codes for Statistical Use", UN Standard Country or Area Codes for Statistical Use", UN Standard Country or
Area Codes for Statistical Use, Revision 4 (United Nations Area Codes for Statistical Use, Revision 4 (United Nations
publication, Sales No. 98.XVII.9, June 1999. publication, Sales No. 98.XVII.9, June 1999.
9.2 Informative References 9.2. Informative References
[RFC1766] Alvestrand, H., "Tags for the Identification of [RFC1766] Alvestrand, H., "Tags for the Identification of
Languages", RFC 1766, March 1995. Languages", RFC 1766, March 1995.
[RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions)
Part Three: Message Header Extensions for Non-ASCII Text", Part Three: Message Header Extensions for Non-ASCII Text",
RFC 2047, November 1996. RFC 2047, November 1996.
[RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded
Word Extensions: Character Sets, Languages, and Word Extensions: Character Sets, Languages, and
skipping to change at page 56, line 29 skipping to change at page 57, line 5
ISO 639 Joint Advisory Committee, "ISO 639 Joint Advisory ISO 639 Joint Advisory Committee, "ISO 639 Joint Advisory
Committee: Working principles for ISO 639 maintenance", Committee: Working principles for ISO 639 maintenance",
March 2000, March 2000,
<http://www.loc.gov/standards/iso639-2/ <http://www.loc.gov/standards/iso639-2/
iso639jac_n3r.html>. iso639jac_n3r.html>.
[record-jar] [record-jar]
Raymond, E., "The Art of Unix Programming", 2003, Raymond, E., "The Art of Unix Programming", 2003,
<urn:isbn:0-13-142901-9>. <urn:isbn:0-13-142901-9>.
Authors' Addresses
Addison Phillips (editor)
Quest Software
Email: addison.phillips@quest.com
URI: http://www.inter-locale.com
Mark Davis (editor)
IBM
Email: mark.davis@us.ibm.com
Appendix A. Acknowledgements Appendix A. Acknowledgements
Any list of contributors is bound to be incomplete; please regard the Any list of contributors is bound to be incomplete; please regard the
following as only a selection from the group of people who have following as only a selection from the group of people who have
contributed to make this document what it is today. contributed to make this document what it is today.
The contributors to RFC 3066 and RFC 1766, the precursors of this The contributors to RFC 3066 and RFC 1766, the precursors of this
document, made enormous contributions directly or indirectly to this document, made enormous contributions directly or indirectly to this
document and are generally responsible for the success of language document and are generally responsible for the success of language
tags. tags.
skipping to change at page 61, line 5 skipping to change at page 61, line 5
de-419-DE (two region tags) de-419-DE (two region tags)
a-DE (use of a single character subtag in primary position; note a-DE (use of a single character subtag in primary position; note
that there are a few grandfathered tags that start with "i-" that that there are a few grandfathered tags that start with "i-" that
are valid) are valid)
ar-a-aaa-b-bbb-a-ccc (two extensions with same single letter ar-a-aaa-b-bbb-a-ccc (two extensions with same single letter
prefix) prefix)
Authors' Addresses
Addison Phillips (editor)
Quest Software
Email: addison.phillips@quest.com
URI: http://www.inter-locale.com
Mark Davis (editor)
IBM
Email: mark.davis@us.ibm.com
Intellectual Property Statement Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79. found in BCP 78 and BCP 79.
 End of changes. 48 change blocks. 
92 lines changed or deleted 90 lines changed or added

This html diff was produced by rfcdiff 1.27, available from http://www.levkowetz.com/ietf/tools/rfcdiff/