draft-ietf-ltru-registry-01.txt   draft-ietf-ltru-registry-02.txt 
Network Working Group A. Phillips, Ed. Network Working Group A. Phillips, Ed.
Internet-Draft Quest Software Internet-Draft Quest Software
Expires: October 28, 2005 M. Davis, Ed. Expires: November 20, 2005 M. Davis, Ed.
IBM IBM
April 26, 2005 May 19, 2005
Tags for Identifying Languages Tags for Identifying Languages
draft-ietf-ltru-registry-01 draft-ietf-ltru-registry-02
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on October 28, 2005. This Internet-Draft will expire on November 20, 2005.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2005). Copyright (C) The Internet Society (2005).
Abstract Abstract
This document describes the structure, content, construction, and This document describes the structure, content, construction, and
semantics of language tags for use in cases where it is desirable to semantics of language tags for use in cases where it is desirable to
indicate the language used in an information object. It also indicate the language used in an information object. It also
skipping to change at page 2, line 24 skipping to change at page 2, line 24
2.2.3 Script Subtag . . . . . . . . . . . . . . . . . . . . 10 2.2.3 Script Subtag . . . . . . . . . . . . . . . . . . . . 10
2.2.4 Region Subtag . . . . . . . . . . . . . . . . . . . . 11 2.2.4 Region Subtag . . . . . . . . . . . . . . . . . . . . 11
2.2.5 Variant Subtags . . . . . . . . . . . . . . . . . . . 12 2.2.5 Variant Subtags . . . . . . . . . . . . . . . . . . . 12
2.2.6 Extension Subtags . . . . . . . . . . . . . . . . . . 13 2.2.6 Extension Subtags . . . . . . . . . . . . . . . . . . 13
2.2.7 Private Use Subtags . . . . . . . . . . . . . . . . . 14 2.2.7 Private Use Subtags . . . . . . . . . . . . . . . . . 14
2.2.8 Pre-Existing RFC 3066 Registrations . . . . . . . . . 15 2.2.8 Pre-Existing RFC 3066 Registrations . . . . . . . . . 15
2.2.9 Classes of Conformance . . . . . . . . . . . . . . . . 15 2.2.9 Classes of Conformance . . . . . . . . . . . . . . . . 15
3. Registry Format and Maintenance . . . . . . . . . . . . . . . 17 3. Registry Format and Maintenance . . . . . . . . . . . . . . . 17
3.1 Format of the IANA Language Subtag Registry . . . . . . . 17 3.1 Format of the IANA Language Subtag Registry . . . . . . . 17
3.2 Maintenance of the Registry . . . . . . . . . . . . . . . 21 3.2 Maintenance of the Registry . . . . . . . . . . . . . . . 21
3.3 Stability of IANA Registry Entries . . . . . . . . . . . . 22 3.3 Stability of IANA Registry Entries . . . . . . . . . . . . 23
3.4 Registration Procedure for Subtags . . . . . . . . . . . . 25 3.4 Registration Procedure for Subtags . . . . . . . . . . . . 26
3.5 Possibilities for Registration . . . . . . . . . . . . . . 29 3.5 Possibilities for Registration . . . . . . . . . . . . . . 29
3.6 Extensions and Extensions Namespace . . . . . . . . . . . 30 3.6 Extensions and Extensions Namespace . . . . . . . . . . . 31
3.7 Conversion of the RFC 3066 Language Tag Registry . . . . . 33 3.7 Conversion of the RFC 3066 Language Tag Registry . . . . . 34
4. Formation and Processing of Language Tags . . . . . . . . . . 36 4. Formation and Processing of Language Tags . . . . . . . . . . 37
4.1 Choice of Language Tag . . . . . . . . . . . . . . . . . . 36 4.1 Choice of Language Tag . . . . . . . . . . . . . . . . . . 37
4.2 Meaning of the Language Tag . . . . . . . . . . . . . . . 38 4.2 Meaning of the Language Tag . . . . . . . . . . . . . . . 39
4.3 Canonicalization of Language Tags . . . . . . . . . . . . 39 4.3 Canonicalization of Language Tags . . . . . . . . . . . . 40
4.4 Considerations for Private Use Subtags . . . . . . . . . . 40 4.4 Considerations for Private Use Subtags . . . . . . . . . . 41
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 42 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 43
6. Security Considerations . . . . . . . . . . . . . . . . . . . 43 6. Security Considerations . . . . . . . . . . . . . . . . . . . 44
7. Character Set Considerations . . . . . . . . . . . . . . . . . 44 7. Character Set Considerations . . . . . . . . . . . . . . . . . 45
8. Changes from RFC 3066 . . . . . . . . . . . . . . . . . . . . 45 8. Changes from RFC 3066 . . . . . . . . . . . . . . . . . . . . 46
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 50 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 50
9.1 Normative References . . . . . . . . . . . . . . . . . . . 50 9.1 Normative References . . . . . . . . . . . . . . . . . . . 50
9.2 Informative References . . . . . . . . . . . . . . . . . . 51 9.2 Informative References . . . . . . . . . . . . . . . . . . 51
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 52 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 52
A. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 53 A. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 53
B. Examples of Language Tags (Informative) . . . . . . . . . . . 54 B. Examples of Language Tags (Informative) . . . . . . . . . . . 54
C. Example Registry . . . . . . . . . . . . . . . . . . . . . . . 57 C. Example Registry . . . . . . . . . . . . . . . . . . . . . . . 57
Intellectual Property and Copyright Statements . . . . . . . . 61 Intellectual Property and Copyright Statements . . . . . . . . 61
1. Introduction 1. Introduction
skipping to change at page 3, line 47 skipping to change at page 3, line 47
This document specifies an identifier mechanism and a registration This document specifies an identifier mechanism and a registration
function for values to be used with that identifier mechanism. It function for values to be used with that identifier mechanism. It
also defines a mechanism for private use values and future extension. also defines a mechanism for private use values and future extension.
This document replaces RFC 3066, which replaced RFC 1766. For a list This document replaces RFC 3066, which replaced RFC 1766. For a list
of changes in this document, see Section 8. of changes in this document, see Section 8.
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC 2119] [10]. document are to be interpreted as described in RFC 2119 [10].
2. The Language Tag 2. The Language Tag
2.1 Syntax 2.1 Syntax
The language tag is composed of one or more parts: A primary language The language tag is composed of one or more parts: A primary language
subtag and a (possibly empty) series of subsequent subtags. Subtags subtag and a (possibly empty) series of subsequent subtags. Subtags
are distinguished by their length, position in the subtag sequence, are distinguished by their length, position in the subtag sequence,
and content, so that each type of subtag can be recognized solely by and content, so that each type of subtag can be recognized solely by
these features. This makes it possible to construct a parser that these features. This makes it possible to construct a parser that
skipping to change at page 7, line 7 skipping to change at page 7, line 7
subtags are limited to a length of eight characters and the extlang, subtags are limited to a length of eight characters and the extlang,
script, and region subtags are limited to even fewer characters. See script, and region subtags are limited to even fewer characters. See
Section 4.1 for more information on selecting the most appropriate Section 4.1 for more information on selecting the most appropriate
Language Tag. Language Tag.
A conformant implementation MAY refuse to support the storage of A conformant implementation MAY refuse to support the storage of
language tags which exceed a specified length. For an example, see language tags which exceed a specified length. For an example, see
[RFC 2231] [22]. Any such limitation MUST be clearly documented, and [RFC 2231] [22]. Any such limitation MUST be clearly documented, and
such documentation SHOULD include the disposition of any longer tags such documentation SHOULD include the disposition of any longer tags
(for example, whether an error value is generated or the language tag (for example, whether an error value is generated or the language tag
is truncated). If truncation is permitted it SHOULD NOT permit a is truncated). If truncation is permitted it MUST NOT permit a
subtag to be divided. subtag to be divided.
2.2 Language Subtag Sources and Interpretation 2.2 Language Subtag Sources and Interpretation
The namespace of language tags and their subtags is administered by The namespace of language tags and their subtags is administered by
the Internet Assigned Numbers Authority (IANA) [13] according to the the Internet Assigned Numbers Authority (IANA) [13] according to the
rules in Section 5 of this document. The registry maintained by IANA rules in Section 5 of this document. The registry maintained by IANA
is the source for valid subtags: other standards referenced in this is the source for valid subtags: other standards referenced in this
section provide the source material for that registry. section provide the source material for that registry.
skipping to change at page 17, line 47 skipping to change at page 17, line 47
Each field can be viewed as a single, logical line of ASCII Each field can be viewed as a single, logical line of ASCII
characters, comprising a field-name and a field-body separated by a characters, comprising a field-name and a field-body separated by a
COLON character (%x3A). For convenience, the field-body portion of COLON character (%x3A). For convenience, the field-body portion of
this conceptual entity can be split into a multiple-line this conceptual entity can be split into a multiple-line
representation; this is called "folding". The format of the registry representation; this is called "folding". The format of the registry
is described by the following ABNF (per [7]): is described by the following ABNF (per [7]):
registry = record *("%%" CRLF record) registry = record *("%%" CRLF record)
record = 1*( field-name *SP ":" *SP field-body CRLF ) record = 1*( field-name *SP ":" *SP field-body CRLF )
field-name = *(ALPHA/NUM/"-") field-name = *(ALPHA / DIGIT / "-")
field-body = *(ASCCHAR/LWSP) field-body = *(ASCCHAR/LWSP)
ASCCHAR = %x21-25 / %x27-7E / UNICHAR ; Note: AMPERSAND is %x26 ASCCHAR = %x21-25 / %x27-7E / UNICHAR ; Note: AMPERSAND is %x26
UNICHAR = "&#x" 2*6HEXDIG ";" UNICHAR = "&#x" 2*6HEXDIG ";"
The sequence '..' (%x2E.2E) in a field-body denotes a range of The sequence '..' (%x2E.2E) in a field-body denotes a range of
values. Such a range represents all subtags of the same length that values. Such a range represents all subtags of the same length that
are alphabetically within that range, including the values explicitly are alphabetically within that range, including the values explicitly
mentioned. For example 'a..c' denotes the values 'a', 'b', and 'c'. mentioned. For example 'a..c' denotes the values 'a', 'b', and 'c'.
Characters from outside the US-ASCII repertoire, as well as the Characters from outside the US-ASCII repertoire, as well as the
skipping to change at page 19, line 15 skipping to change at page 19, line 15
o Description o Description
* Description's field-value contains a non-normative description * Description's field-value contains a non-normative description
of the subtag or tag. of the subtag or tag.
o Added o Added
* Added's field-value contains the date the record was added to * Added's field-value contains the date the record was added to
the registry. the registry.
The field 'Description' MAY appear more than one time. The The 'Subtag' or 'Tag' field MUST use lowercase letters to form the
'Description' field must contain a description of the tag being subtag or tag, with two exceptions. Subtags whose 'Type' field is
registered written or transcribed into the Latin script; it may also 'script' (in other words, subtags defined by ISO 15924) MUST use
include a description in a non-Latin script. The 'Description' field titlecase. Subtags whose 'Type' field is 'region' (in other words,
is used for identification purposes and should not be taken to subtags defined by ISO 3166) MUST use uppercase. These exceptions
represent the actual native name of the language or variation or to mirror the use of case in the underlying standards.
be in any particular language. Most descriptions are taken directly
from source standards such as ISO 639 or ISO 3166. The field 'Description' MAY appear more than one time. At least one
of the 'Description' fields must contain a description of the tag
being registered written or transcribed into the Latin script; the
same or additional fields may also include a description in a non-
Latin script. The 'Description' field is used for identification
purposes and should not be taken to represent the actual native name
of the language or variation or to be in any particular language.
Most descriptions are taken directly from source standards such as
ISO 639 or ISO 3166.
Note: Descriptions in registry entries that correspond to ISO 639, Note: Descriptions in registry entries that correspond to ISO 639,
ISO 15924, ISO 3166 or UN M.49 codes are intended only to indicate ISO 15924, ISO 3166 or UN M.49 codes are intended only to indicate
the meaning of that identifier as defined in the source standard at the meaning of that identifier as defined in the source standard at
the time it was added to the registry. The description does not the time it was added to the registry. The description does not
replace the content of the source standard itself. The descriptions replace the content of the source standard itself. The descriptions
are not intended to be the English localized names for the subtags. are not intended to be the English localized names for the subtags.
Localization or translation of language tag and subtag descriptions Localization or translation of language tag and subtag descriptions
is out of scope of this document. is out of scope of this document.
skipping to change at page 21, line 30 skipping to change at page 21, line 39
guarantee of stability is provided. The content of this field is not guarantee of stability is provided. The content of this field is not
restricted, except by the need to register the information, the restricted, except by the need to register the information, the
suitability of the request, and by reasonable practical size suitability of the request, and by reasonable practical size
limitations. Long screeds about a particular subtag are frowned limitations. Long screeds about a particular subtag are frowned
upon. upon.
The field 'Suppress-Script' MUST only appear in records whose 'Type' The field 'Suppress-Script' MUST only appear in records whose 'Type'
field-value is 'language'. This field may appear at most one time in field-value is 'language'. This field may appear at most one time in
a record. This field indicates a script used to write the a record. This field indicates a script used to write the
overwhelming majority of documents for the given language and which overwhelming majority of documents for the given language and which
therefore adds no distinguishing information to a language tag. For therefore adds no distinguishing information to a language tag. It
example, virtually all Icelandic documents are written in the Latin helps ensure greater compatibility between the language tags
script, making the subtag 'Latn' redundant in the tag "is-Latn". generated according to the rules in this document and language tags
and tag processors or consumers based on RFC 3066. For example,
virtually all Icelandic documents are written in the Latin script,
making the subtag 'Latn' redundant in the tag "is-Latn".
For examples of registry entries and their format, see Appendix C. For examples of registry entries and their format, see Appendix C.
3.2 Maintenance of the Registry 3.2 Maintenance of the Registry
Maintenance of the registry requires that as new codes are assigned Maintenance of the registry requires that as new codes are assigned
by ISO 639, ISO 15924, and ISO 3166, the Language Subtag Reviewer by ISO 639, ISO 15924, and ISO 3166, the Language Subtag Reviewer
will evaluate each assignment, determine whether it conflicts with will evaluate each assignment, determine whether it conflicts with
existing registry entries, and submit the information to IANA for existing registry entries, and submit the information to IANA for
inclusion in the registry. If an assignment takes place and the inclusion in the registry. If an assignment takes place and the
skipping to change at page 22, line 38 skipping to change at page 23, line 4
Subtag: nedis Subtag: nedis
Description: Natisone dialect Description: Natisone dialect
Description: Nadiza dialect Description: Nadiza dialect
Added: 2003-10-09 Added: 2003-10-09
Recommended-Prefix: sl Recommended-Prefix: sl
Comments: This is a comment shown Comments: This is a comment shown
as an example. as an example.
%% %%
Figure 4 Figure 4
Whenever an entry is created or modified in the registry, the 'File- Whenever an entry is created or modified in the registry, the 'File-
Date' record at the start of the registry is updated to reflect the Date' record at the start of the registry is updated to reflect the
most recent modification date in the RFC 3339 [14] "full-date" most recent modification date in the RFC 3339 [14] "full-date"
format. format.
Values in the 'Subtag' field must be lowercase except as provided for
in Section 3.1.
3.3 Stability of IANA Registry Entries 3.3 Stability of IANA Registry Entries
The stability of entries and their meaning in the registry is The stability of entries and their meaning in the registry is
critical to the long term stability of language tags. The rules in critical to the long term stability of language tags. The rules in
this section guarantee that a specific language tag's meaning is this section guarantee that a specific language tag's meaning is
stable over time and will not change and that the choice of language stable over time and will not change and that the choice of language
tag for specific content is also stable over time. tag for specific content is also stable over time.
These rules specifically deal with how changes to codes (including These rules specifically deal with how changes to codes (including
withdrawal and deprecation of codes) maintained by ISO 639, ISO withdrawal and deprecation of codes) maintained by ISO 639, ISO
skipping to change at page 28, line 41 skipping to change at page 29, line 23
Language Subtag Reviewer decides whether there is consensus to update Language Subtag Reviewer decides whether there is consensus to update
the registration following the two week review period; normally the registration following the two week review period; normally
objections by the original registrant will carry extra weight in objections by the original registrant will carry extra weight in
forming such a consensus. forming such a consensus.
Registrations are permanent and stable. Once registered, subtags Registrations are permanent and stable. Once registered, subtags
will not be removed from the registry and will remain the canonical will not be removed from the registry and will remain the canonical
method of referring to a specific language or variant. This method of referring to a specific language or variant. This
provision does not apply to grandfathered tags, which may become provision does not apply to grandfathered tags, which may become
deprecated due to registration of subtags. For example, the tag deprecated due to registration of subtags. For example, the tag
"i-navajo" is deprecated in favor of the ISO 639-1 based subtag 'nv'. "i-navajo" is deprecated in favor of the tag "nv", which consists of
the single primary language subtag 'nv'.
Note: The purpose of the "published description" in the registration Note: The purpose of the "published description" in the registration
form is intended as an aid to people trying to verify whether a form is intended as an aid to people trying to verify whether a
language is registered or what language or language variation a language is registered or what language or language variation a
particular subtag refers to. In most cases, reference to an particular subtag refers to. In most cases, reference to an
authoritative grammar or dictionary of that language will be useful; authoritative grammar or dictionary of that language will be useful;
in cases where no such work exists, other well known works describing in cases where no such work exists, other well known works describing
that language or in that language may be appropriate. The subtag that language or in that language may be appropriate. The subtag
reviewer decides what constitutes "good enough" reference material. reviewer decides what constitutes "good enough" reference material.
This requirement is not intended to exclude particular languages or This requirement is not intended to exclude particular languages or
skipping to change at page 30, line 48 skipping to change at page 31, line 31
United Nations, Room DC2-1620 United Nations, Room DC2-1620
New York, NY 10017, USA New York, NY 10017, USA
Fax: +1-212-963-0623 Fax: +1-212-963-0623
E-mail: statistics@un.org E-mail: statistics@un.org
URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm
3.6 Extensions and Extensions Namespace 3.6 Extensions and Extensions Namespace
Extension subtags are those introduced by single-letter subtags other Extension subtags are those introduced by single-letter subtags other
than 'x-'. They are reserved for the generation of identifiers which than 'x'. They are reserved for the generation of identifiers which
contain a language component, and are compatible with applications contain a language component, and are compatible with applications
understand language tags. For example, they might be used to define that understand language tags. For example, they might be used to
locale identifiers, which are generally based on language. define locale identifiers, which are generally based on language.
The structure and form of extensions are defined by this document so The structure and form of extensions are defined by this document so
that implementations can be created that are forward compatible with that implementations can be created that are forward compatible with
applications that may be created using single-letter subtags in the applications that may be created using single-letter subtags in the
future. In addition, defining a mechanism for maintaining single- future. In addition, defining a mechanism for maintaining single-
letter subtags will lend to the stability of this document by letter subtags will lend to the stability of this document by
reducing the likely need for future revisions or updates. reducing the likely need for future revisions or updates.
Allocation of a single-letter subtag shall take the form of an RFC Allocation of a single-letter subtag shall take the form of an RFC
defining the name, purpose, processes, and procedures for maintaining defining the name, purpose, processes, and procedures for maintaining
skipping to change at page 31, line 52 skipping to change at page 32, line 35
once defined by a specification, MUST NOT be retracted or change once defined by a specification, MUST NOT be retracted or change
in meaning in any substantial way. in meaning in any substantial way.
o The specification MUST include in a separate section the o The specification MUST include in a separate section the
registration form reproduced in this section (below) to be used in registration form reproduced in this section (below) to be used in
registering the extension upon publication as an RFC. registering the extension upon publication as an RFC.
o IANA MUST be informed of changes to the contact information and o IANA MUST be informed of changes to the contact information and
URL for the specification. URL for the specification.
o Modified the latin-script requirement on the 'Description' field
so that "at least one Description field" must contain a Latin
transcription. (A.Phillips)
IANA will maintain a registry of allocated single-letter (singleton) IANA will maintain a registry of allocated single-letter (singleton)
subtags. This registry will use the record-jar format described by subtags. This registry will use the record-jar format described by
the ABNF in Section 3.1. Upon publication of an extension as an RFC, the ABNF in Section 3.1. Upon publication of an extension as an RFC,
the maintaining authority defined in the RFC must forward this the maintaining authority defined in the RFC must forward this
registration form to iesg@ietf.org, who will forward the request to registration form to iesg@ietf.org, who will forward the request to
iana@iana.org. The maintaining authority of the extension MUST iana@iana.org. The maintaining authority of the extension MUST
maintain the accuracy of the record by sending an updated full copy maintain the accuracy of the record by sending an updated full copy
of the record to iana@iana.org with the subject line "LANGUAGE TAG of the record to iana@iana.org with the subject line "LANGUAGE TAG
EXTENSION UPDATE" whenever content changes. Only the 'Comments', EXTENSION UPDATE" whenever content changes. Only the 'Comments',
'Contact_Email', 'Mailing_List', and 'URL' fields may be modified in 'Contact_Email', 'Mailing_List', and 'URL' fields may be modified in
skipping to change at page 36, line 32 skipping to change at page 37, line 32
Of particular note, many applications can benefit from the use of Of particular note, many applications can benefit from the use of
script subtags in language tags, as long as the use is consistent for script subtags in language tags, as long as the use is consistent for
a given context. Script subtags were not formally defined in RFC a given context. Script subtags were not formally defined in RFC
3066 and their use may affect matching and subtag identification by 3066 and their use may affect matching and subtag identification by
implementations of RFC 3066, as these subtags appear between the implementations of RFC 3066, as these subtags appear between the
primary language and region subtags. For example, if a user requests primary language and region subtags. For example, if a user requests
content in an implementation of Section 2.5 of RFC 3066 [23] using content in an implementation of Section 2.5 of RFC 3066 [23] using
the language range "en-US", content labeled "en-Latn-US" will not the language range "en-US", content labeled "en-Latn-US" will not
match the request. Therefore it is important to know when script match the request. Therefore it is important to know when script
subtags will customarily be used and when they should not be used. subtags will customarily be used and when they should not be used.
In the registry, the Suppress-Script field helps ensure greater
compatibility between the language tags generated according to the
rules in this document and language tags and tag processors or
consumers based on RFC 3066 by defining when users should generally
not include a script subtag with a particular primary language
subtag.
Extended language subtags (type 'extlang' in the registry, see Extended language subtags (type 'extlang' in the registry, see
Section 3.1) also appear between the primary language and region Section 3.1) also appear between the primary language and region
subtags and are reserved for future standardization. Applications subtags and are reserved for future standardization. Applications
may benefit from their judicious use in forming language tags in the may benefit from their judicious use in forming language tags in the
future and similar recommendations are expected to apply to their use future and similar recommendations are expected to apply to their use
as apply to script subtags. as apply to script subtags.
Standards, protocols and applications that reference this document Standards, protocols and applications that reference this document
normatively but apply different rules to the ones given in this normatively but apply different rules to the ones given in this
skipping to change at page 39, line 29 skipping to change at page 40, line 33
Since a particular language tag may be used in many processes, Since a particular language tag may be used in many processes,
language tags SHOULD always be created or generated in a canonical language tags SHOULD always be created or generated in a canonical
form. form.
A language tag is in canonical form when: A language tag is in canonical form when:
1. The tag is well-formed according the rules in Section 2.1 and 1. The tag is well-formed according the rules in Section 2.1 and
Section 2.2. Section 2.2.
2. None of the subtags in the language tag has a canonical_value 2. None of the subtags in the language tag has a Canonical-Value
mapping in the IANA registry (see Section 3.1). Subtags with a mapping in the IANA registry (see Section 3.1). Subtags with a
canonical_value mapping MUST be replaced with their mapping in Canonical-Value mapping MUST be replaced with their mapping in
order to canonicalize the tag. order to canonicalize the tag.
3. If more than one extension subtag sequence exists, the extension 3. If more than one extension subtag sequence exists, the extension
sequences are ordered into case-insensitive ASCII order by sequences are ordered into case-insensitive ASCII order by
singleton subtag. singleton subtag.
Example: The language tag "en-A-aaa-B-ccc-bbb-x-xyz" is in canonical Example: The language tag "en-A-aaa-B-ccc-bbb-x-xyz" is in canonical
form, while "en-B-ccc-bbb-A-aaa-X-xyz" is well-formed but not in form, while "en-B-ccc-bbb-A-aaa-X-xyz" is well-formed but not in
canonical form. canonical form.
Example: The language tag "en-NH" (English as used in the New Example: The language tag "en-NH" (English as used in the New
Hebrides) is not canonical because the 'NH' subtag has a canonical Hebrides) is not canonical because the 'NH' subtag has a canonical
mapping to 'VU' (Vanuatu). mapping to 'VU' (Vanuatu).
Note: Canonicalization of language tags does not imply anything about Canonicalization of language tags does not imply anything about the
the use of upper or lowercase letter in subtags as described in use of upper or lowercase letters when processing or comparing
Section 2.1. All comparisons MUST be performed in a case-insensitive subtags (and as described in Section 2.1). All comparisons MUST be
manner. performed in a case-insensitive manner.
When performing canonicalization of language tags, processors MAY
optionally regularize the case of the subtags, following the case
used in the registry. Note that this corresponds to the following
casing rules: uppercase all non-initial two-letter subtags; titlecase
all non-initial four-letter subtags; lowercase everything else.
Note: Case folding of ASCII letters in certain locales, unless
carefully handled, may produce non-ASCII character values. The
Unicode Character Database file "SpecialCasing.txt" defines the
specific cases that are known to cause problems with this. In
particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is
uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE).
Implementers should specify a locale-neutral casing operation to
ensure that case folding of subtags does not produce this value,
which is illegal in language tags. For example, if one were to
uppercase the region subtag 'in' using Turkish locale rules, the
sequence U+0130 U+004E would result instead of the expected 'IN'.
Note: if the field 'Deprecated' appears in a registry record without Note: if the field 'Deprecated' appears in a registry record without
an accompanying 'Canonical' field, then that tag or subtag is an accompanying 'Canonical' field, then that tag or subtag is
deprecated without a replacement. Validating processors SHOULD NOT deprecated without a replacement. Validating processors SHOULD NOT
generate tags that include these values, although the values are generate tags that include these values, although the values are
canonical when they appear in a language tag. canonical when they appear in a language tag.
An extension MUST define any relationships that may exist between the An extension MUST define any relationships that may exist between the
various subtags in the extension and thus MAY define an alternate various subtags in the extension and thus MAY define an alternate
canonicalization scheme for the extension's subtags. Extensions MAY canonicalization scheme for the extension's subtags. Extensions MAY
skipping to change at page 42, line 37 skipping to change at page 43, line 37
inserting or replacing whole records preformatted for IANA by the inserting or replacing whole records preformatted for IANA by the
Language Subtag Reviewer as described in Section 3.2 of this Language Subtag Reviewer as described in Section 3.2 of this
document. Each record will be sent to iana@iana.org with a subject document. Each record will be sent to iana@iana.org with a subject
line indicating whether the enclosed record is an insertion (of a new line indicating whether the enclosed record is an insertion (of a new
record) or a replacment of an existing record which has a Type and record) or a replacment of an existing record which has a Type and
Subtag (or Tag) field that exactly matches the record sent. Records Subtag (or Tag) field that exactly matches the record sent. Records
cannot be deleted from the registry. cannot be deleted from the registry.
The Language Tag Extensions registry will also be generated and sent The Language Tag Extensions registry will also be generated and sent
to IANA as described in Section 3.6. This registry may contain at to IANA as described in Section 3.6. This registry may contain at
most 25 records and thus changes to this registry are expected to be most 35 records and thus changes to this registry are expected to be
very infrequent. very infrequent.
Future work by IANA on the Language Tag Extensions Registry is Future work by IANA on the Language Tag Extensions Registry is
limited to two cases. First, the IESG may request that new records limited to two cases. First, the IESG may request that new records
be inserted into this registry from time to time. These requests be inserted into this registry from time to time. These requests
will include the record to insert in the exact format described in will include the record to insert in the exact format described in
Section 3.6. In addition, there may be occasional requests from the Section 3.6. In addition, there may be occasional requests from the
maintaining authority for a specific extension to update the contact maintaining authority for a specific extension to update the contact
information or URLs in the record. These requests MUST include the information or URLs in the record. These requests MUST include the
complete, updated record. IANA is not responsible for validating the complete, updated record. IANA is not responsible for validating the
skipping to change at page 46, line 49 skipping to change at page 47, line 49
IANA registry. This allows for robust implementation and ease of IANA registry. This allows for robust implementation and ease of
maintenance. The language subtag registry becomes the canonical maintenance. The language subtag registry becomes the canonical
source for forming language tags. source for forming language tags.
o Provides a process that guarantees stability of language tags, by o Provides a process that guarantees stability of language tags, by
handling reuse of values by ISO 639, ISO 15924, and ISO 3166 in handling reuse of values by ISO 639, ISO 15924, and ISO 3166 in
the event that they register a previously used value for a new the event that they register a previously used value for a new
purpose. purpose.
o Allows ISO 15924 script code subtags and allows them to be used o Allows ISO 15924 script code subtags and allows them to be used
generatively. Adds the concept of a variant subtag and allows generatively. Defines a method for indicating in the registry
variants to be used generatively. Adds the ability to use a class when script subtags are necessary for a given language tag.
of UN tags as regions.
o Adds the concept of a variant subtag and allows variants to be
used generatively.
o Adds the ability to use a class of UN M.49 tags for supra-
national regions and to resolve conflicts in the assignment of ISO
3166 codes.
o Defines the private-use tags in ISO 639, ISO 15924, and ISO 3166 o Defines the private-use tags in ISO 639, ISO 15924, and ISO 3166
as the mechanism for creating private-use language, script, and as the mechanism for creating private-use language, script, and
region subtags respectively. region subtags respectively.
o Adds a well-defined extension mechanism. o Adds a well-defined extension mechanism.
o Defines an extended language subtag, possibly for use with certain o Defines an extended language subtag, possibly for use with certain
anticipated features of ISO 639-3. anticipated features of ISO 639-3.
Ed Note: The following items are provided for the convenience of Ed Note: The following items are provided for the convenience of
reviewers and will be removed from the final document. reviewers and will be removed from the final document.
Changes between draft-ietf-ltru-registry-00 and this version are: Changes between draft-ietf-ltru-registry-01 and this version are:
o Updated the ABNF for singleton to make it conform to RFC 2234 and
pass the Fenner parser (F.Ellermann)
o Split the references into informative and normative lists.
Eliminated dead references carried forward from previous versions
of this document. (A.Phillips)
o Added a reference to RFC 3552 (BCP 72) to the Security
Considerations section (I.McDonald)
o Modified the first sentence in Section 2.1.1 from "on the number
of size of subtags in a Language Tag" to be proper English and
convey more meaning. (A.Phillips)
o Various examples that used the variant 'boont' were changes to use
the variant 'scouse' instead. (J.Cowan)
o Added an additional example ("en-a-bbb-x-a-ccc") to the extension/
singleton rules in Section 2.2.6 to illustrate that singletons can
recur in private use sequences (A.Phillips)
o Modified the sentence describing the possibilities for variant
registration (see Section 3.5) to include transliterations and
other transformations per discussion on the list. (M.T. Carrasco
Benitez)
o Converted the format of the registry to record-jar format. This
subtantially replaces section 3.1 (R.Presuhn)
o Subtantially revised the rules for registry creation to reflect
the Date A/B boundaries on adopting ISO 3166 codes (J.Cowan)
o Modified the registration process section and form to deal with
both new additions and revisions of records, as well as making
life easier on the Subtag Reviewer by matching the fields to the
registry format. (A.Phillips)
o Changed the reference to RFC 2234 to RFC 2234bis (recently
adopted). (S.Hollenbeck)
o Modifications to make this document conformant with RFC 3978
(recently adopted). (R.Presuhn)
o Added an informative reference to XML Schema 1.0 Part 2: Second
Edition in this section. (J.Morfin)
o Expanded the jargon-ish 'extlang' to "extended language" in this
section. (J.Morfin)
o Corrected an egregious error in the ABNF (%x6A -> %x5A in one of
the ranges) (A.Phillips)
o Split Maintenance of the Registry from Format of the Registry
(A.Phillips)
o Revision of section Section 3.4 to make it consistent with the new
section Section 3.2. (A.Phillips)
o Separated IANA Considerations section from the registry definition
and registration procedures. ()
o Added additional choice information dealing with scripts and
extlangs. These items were also moved to a new section following
the registry format because of interdependence.
o Updated the IANA Considerations section.
o Added appeal and maintenance requirements to the extensions
Section 3.6 section. (A.Phillips)
o Added an additional bullet point to Section 3.5 enumerating the
changes that can be registered to a record (previously we only
listed the options for new subtags). (A.Phillips)
o Added the phrase ", as well as the possibility of other ISO 639
parts becoming useful for the formation of language tags in the
future" to this section in anticipation of revising the ABNF to
allow for the possibility of ISO 639-6 being used in language tags
in a future revision of this document. (D.Garside)
o Added the concept of 'Suppress-Script' to Section 4.1, as well as
to the registry format in Section 3.1, Section 3.3 and
Section 3.2. (many)
o Added text requiring the I-D that defines an extension to choose a
letter (and allowing the IESG to change it if necessary).
(D.Ewell?)
o Removed the ABNF notes from the text about case insensitivity
(F.Ellermann)
o Removed the second, rather repetitive reference to Appendix B in
Section 2.1 (A.Phillips)
o Fixed missing whitesapce in Section 2.1 (F.Ellermann)
o Changed "empty" to "omitted" in Section 2.2.1 (F.Ellermann)
o Changed the intro to Section 2.2.1 and otherwise tugged at that
section to deal with i-* grandfathered items. (F.Ellermann)
o Reserved alpha4 language subtags for future standardization. o Minor updates to the changes section (the text just above) to
(D.Garside) reflect various updates in the WG drafts (A.Phillips)
o Incorporate changes to be consistent with RFC 3978, including the o Minor change to the section on the extensions registry (because
new xml2rfc processor. Note that this has an effect on the ABNF, there can be 35, not 25, entries maximum. (D.Ewell)
since some of the comments were too wide previously (comments were
revised to fit the 72 character maximum). (S.Hollenbeck)
o Remove the Latin-1 restriction on the 'Description' field. o Changed "SHOULD NOT permit a subtag to be divided" to MUST NOT.
Provide guidance for registration of content, including a (#944) (R.Presuhn)
requirement for at least one representation in the Latin script.
(F.Ellermann, A.Phillips)
o Make the variant subtlety less so. (F.Ellermann) o Added text to Section 3.1 and Section 4.1 describing the rationale
for Suppress-Script. Both sentences are slight rewordings of this
text suggested in the email thread: "This field helps ensure
greater compatibility between the language tags generated
according to the rules in this document and language tags and tag
processors or consumers based on RFC 3066." (#954) (F.Ellermann,
A.Phillips)
o Various 'you' removals and cleanup (M.Davis) o Added text about case folding during canonicalization. This also
includes rules in Section 3.2 for casing of registry entries, as
well the insertion of the text permitting case normalization in
Section 4.3 and the warning about locale-specific casing
operations in the same section. (#985) (F.Ellermann, J.Cowan,
A.Phillips)
o Inserted additional non-normative caveat about the 'MUL' subtag o Fixed the reference to Canonical-Value in Section 4.3.
(A.Phillips) (A.Phillips)
o In Section 3.4, changed the reference from the subtag 'nv' to the
tag "nv" to be consistent with the wording in Section 3.1. (part
of #954) (D.Ewell)
o Various editorial edits (J.Cowan) o Added missing word 'that' in Section 3.6 (A.Phillips)
o Use normative language when giving permission to not store long
language tags in Section 2.1.1. (J.Cowan)
9. References 9. References
9.1 Normative References 9.1 Normative References
[1] International Organization for Standardization, "ISO 639- [1] International Organization for Standardization, "ISO 639-
1:2002, Codes for the representation of names of languages -- 1:2002, Codes for the representation of names of languages --
Part 1: Alpha-2 code", ISO Standard 639, 2002. Part 1: Alpha-2 code", ISO Standard 639, 2002.
[2] International Organization for Standardization, "ISO 639-2:1998 [2] International Organization for Standardization, "ISO 639-2:1998
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/