draft-ietf-ltru-registry-07.txt   draft-ietf-ltru-registry-08.txt 
Network Working Group A. Phillips, Ed. Network Working Group A. Phillips, Ed.
Internet-Draft Quest Software Internet-Draft Quest Software
Expires: December 26, 2005 M. Davis, Ed. Expires: December 30, 2005 M. Davis, Ed.
IBM IBM
June 24, 2005 June 28, 2005
Tags for Identifying Languages Tags for Identifying Languages
draft-ietf-ltru-registry-07 draft-ietf-ltru-registry-08
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on December 26, 2005. This Internet-Draft will expire on December 30, 2005.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2005). Copyright (C) The Internet Society (2005).
Abstract Abstract
This document describes the structure, content, construction, and This document describes the structure, content, construction, and
semantics of language tags for use in cases where it is desirable to semantics of language tags for use in cases where it is desirable to
indicate the language used in an information object. It also indicate the language used in an information object. It also
describes how to register values for use in language tags and the describes how to register values for use in language tags and the
creation of user defined extensions for private interchange. This creation of user defined extensions for private interchange.
document obsoletes RFC 3066 (which replaced RFC 1766).
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. The Language Tag . . . . . . . . . . . . . . . . . . . . . . . 4 2. The Language Tag . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Language Subtag Sources and Interpretation . . . . . . . . 6 2.2 Language Subtag Sources and Interpretation . . . . . . . . 6
2.2.1 Primary Language Subtag . . . . . . . . . . . . . . . 7 2.2.1 Primary Language Subtag . . . . . . . . . . . . . . . 7
2.2.2 Extended Language Subtags . . . . . . . . . . . . . . 9 2.2.2 Extended Language Subtags . . . . . . . . . . . . . . 9
2.2.3 Script Subtag . . . . . . . . . . . . . . . . . . . . 10 2.2.3 Script Subtag . . . . . . . . . . . . . . . . . . . . 10
2.2.4 Region Subtag . . . . . . . . . . . . . . . . . . . . 11 2.2.4 Region Subtag . . . . . . . . . . . . . . . . . . . . 11
2.2.5 Variant Subtags . . . . . . . . . . . . . . . . . . . 12 2.2.5 Variant Subtags . . . . . . . . . . . . . . . . . . . 13
2.2.6 Extension Subtags . . . . . . . . . . . . . . . . . . 14 2.2.6 Extension Subtags . . . . . . . . . . . . . . . . . . 14
2.2.7 Private Use Subtags . . . . . . . . . . . . . . . . . 15 2.2.7 Private Use Subtags . . . . . . . . . . . . . . . . . 15
2.2.8 Pre-Existing RFC 3066 Registrations . . . . . . . . . 15 2.2.8 Pre-Existing RFC 3066 Registrations . . . . . . . . . 15
2.2.9 Classes of Conformance . . . . . . . . . . . . . . . . 16 2.2.9 Classes of Conformance . . . . . . . . . . . . . . . . 16
3. Registry Format and Maintenance . . . . . . . . . . . . . . . 18 3. Registry Format and Maintenance . . . . . . . . . . . . . . . 18
3.1 Format of the IANA Language Subtag Registry . . . . . . . 18 3.1 Format of the IANA Language Subtag Registry . . . . . . . 18
3.2 Maintenance of the Registry . . . . . . . . . . . . . . . 23 3.2 Maintenance of the Registry . . . . . . . . . . . . . . . 23
3.3 Stability of IANA Registry Entries . . . . . . . . . . . . 25 3.3 Stability of IANA Registry Entries . . . . . . . . . . . . 25
3.4 Registration Procedure for Subtags . . . . . . . . . . . . 28 3.4 Registration Procedure for Subtags . . . . . . . . . . . . 28
3.5 Possibilities for Registration . . . . . . . . . . . . . . 31 3.5 Possibilities for Registration . . . . . . . . . . . . . . 31
skipping to change at page 4, line 17 skipping to change at page 4, line 17
The language tag always defines a language as used (which includes The language tag always defines a language as used (which includes
being spoken, written, signed, or otherwise signaled) by human being spoken, written, signed, or otherwise signaled) by human
beings for communication of information to other human beings. beings for communication of information to other human beings.
Computer languages such as programming languages are explicitly Computer languages such as programming languages are explicitly
excluded. excluded.
2.1 Syntax 2.1 Syntax
The language tag is composed of one or more parts or "subtags". Each The language tag is composed of one or more parts or "subtags". Each
subtag consists of a sequence of alpha-numeric characters. Subtags subtag consists of a sequence of alpha-numeric characters. Subtags
are distinguished and separated from one another by a hyphen ("-"). are distinguished and separated from one another by a hyphen ("-",
A language tag consists of a "primary language" subtag and a ABNF %x2D). A language tag consists of a "primary language" subtag
(possibly empty) series of subsequent subtags, each of which refines and a (possibly empty) series of subsequent subtags, each of which
or narrows the range of language identified by the overall tag. refines or narrows the range of language identified by the overall
tag.
Each type of subtag is distinguished by length, position in the tag, Each type of subtag is distinguished by length, position in the tag,
and content: subtags can be recognized solely by these features. and content: subtags can be recognized solely by these features.
This makes it possible to construct a parser that can extract and This makes it possible to construct a parser that can extract and
assign some semantic information to the subtags, even if the specific assign some semantic information to the subtags, even if the specific
subtag values are not recognized. Thus a parser need not have an up- subtag values are not recognized. Thus a parser need not have an up-
to-date copy (or any copy at all) of the subtag registry to perform to-date copy (or any copy at all) of the subtag registry to perform
most searching and matching operations. most searching and matching operations.
The syntax of the language tag in ABNF [RFC2234bis] is: The syntax of the language tag in ABNF [RFC2234bis] is:
skipping to change at page 5, line 39 skipping to change at page 5, line 39
; Single letters: x/X is reserved for private use ; Single letters: x/X is reserved for private use
registered-lang = 4*8ALPHA ; registered language subtag registered-lang = 4*8ALPHA ; registered language subtag
grandfathered = 1*3ALPHA 1*2("-" (2*8alphanum)) grandfathered = 1*3ALPHA 1*2("-" (2*8alphanum))
; grandfathered registration ; grandfathered registration
; Note: i is the only singleton ; Note: i is the only singleton
; that starts a grandfathered tag ; that starts a grandfathered tag
alphanum = (ALPHA / DIGIT) ; letters and numbers alphanum = (ALPHA / DIGIT) ; letters and numbers
Figure 1: Language Tag ABNF Figure 1: Language Tag ABNF
The character "-" is HYPHEN-MINUS (ABNF: %x2D). All subtags have a Note: There is a subtlety in the ABNF for 'variant': variants
maximum length of eight characters. Note that there is a subtlety in starting with a digit MAY be four characters long, while those
the ABNF for 'variant': variants starting with a digit MAY be four starting with a letter MUST be at least five characters long.
characters long, while those starting with a letter MUST be at least
five characters long.
Whitespace is not permitted in a language tag. For examples of All subtags have a maximum length of eight characters and whitespace
language tags, see Appendix B. is not permitted in a language tag. For examples of language tags,
see Appendix B.
Note that although [RFC2234bis] refers to octets, the language tags Note that although [RFC2234bis] refers to octets, the language tags
described in this document are sequences of characters from the US- described in this document are sequences of characters from the US-
ASCII repertoire. Language tags MAY be used in documents and ASCII repertoire. Language tags MAY be used in documents and
applications that use other encodings, so long as these encompass the applications that use other encodings, so long as these encompass the
US-ASCII repertoire. An example of this would be an XML document US-ASCII repertoire. An example of this would be an XML document
that uses the UTF-16LE [RFC2781] encoding of [Unicode]. that uses the UTF-16LE [RFC2781] encoding of [Unicode].
The tags and their subtags, including private-use and extensions, are The tags and their subtags, including private-use and extensions, are
to be treated as case insensitive: there exist conventions for the to be treated as case insensitive: there exist conventions for the
skipping to change at page 12, line 19 skipping to change at page 12, line 15
used to form language tags that represent the country or used to form language tags that represent the country or
region for which they are defined. region for which they are defined.
D. UN numeric codes for countries or areas for which there is an D. UN numeric codes for countries or areas for which there is an
associated ISO 3166 alpha-2 code in the registry MUST NOT be associated ISO 3166 alpha-2 code in the registry MUST NOT be
entered into the registry and MUST NOT be used to form entered into the registry and MUST NOT be used to form
language tags. Note that the ISO 3166-based subtag in the language tags. Note that the ISO 3166-based subtag in the
registry MUST actually be associated with the UN M.49 code in registry MUST actually be associated with the UN M.49 code in
question. question.
E. All other UN numeric codes for countries or areas which do E. UN numeric codes and ISO 3166 alpha-2 codes for countries or
areas listed as eligible for registration in [initial-
registry] but not presently registered MAY be entered into
the IANA registry via the process described in Section 3.4.
Once registered, these codes MAY be used to form language
tags.
F. All other UN numeric codes for countries or areas which do
not have an associated ISO 3166 alpha-2 code MUST NOT be not have an associated ISO 3166 alpha-2 code MUST NOT be
entered into the registry and MUST NOT be used to form entered into the registry and MUST NOT be used to form
language tags. For more information about these codes, see language tags. For more information about these codes, see
Section 3.3. Section 3.3.
4. Note: The alphanumeric codes in Appendix X of the UN document 4. Note: The alphanumeric codes in Appendix X of the UN document
MUST NOT be entered into the registry and MUST NOT be used to MUST NOT be entered into the registry and MUST NOT be used to
form language tags. (At the time this document was created these form language tags. (At the time this document was created these
values match the ISO 3166 alpha-2 codes.) values match the ISO 3166 alpha-2 codes.)
skipping to change at page 25, line 18 skipping to change at page 25, line 18
critical to the long term stability of language tags. The rules in critical to the long term stability of language tags. The rules in
this section guarantee that a specific language tag's meaning is this section guarantee that a specific language tag's meaning is
stable over time and will not change. stable over time and will not change.
These rules specifically deal with how changes to codes (including These rules specifically deal with how changes to codes (including
withdrawal and deprecation of codes) maintained by ISO 639, ISO withdrawal and deprecation of codes) maintained by ISO 639, ISO
15924, ISO 3166, and UN M.49 are reflected in the IANA Language 15924, ISO 3166, and UN M.49 are reflected in the IANA Language
Subtag Registry. Assignments to the IANA Language Subtag Registry Subtag Registry. Assignments to the IANA Language Subtag Registry
MUST follow the following stability rules: MUST follow the following stability rules:
o Values in the fields 'Type', 'Subtag', 'Tag', 'Added', 1. Values in the fields 'Type', 'Subtag', 'Tag', 'Added',
'Deprecated' and 'Preferred-Value' MUST NOT be changed and are 'Deprecated' and 'Preferred-Value' MUST NOT be changed and are
guaranteed to be stable over time. guaranteed to be stable over time.
o Values in the 'Description' field MUST NOT be changed in a way 2. Values in the 'Description' field MUST NOT be changed in a way
that would invalidate previously-existing tags. They MAY be that would invalidate previously-existing tags. They MAY be
broadened somewhat in scope, changed to add information, or broadened somewhat in scope, changed to add information, or
adapted to the most common modern usage. For example, countries adapted to the most common modern usage. For example, countries
occasionally change their official names: an historical example of occasionally change their official names: an historical example
this would be "Upper Volta" changing to "Burkina Faso". of this would be "Upper Volta" changing to "Burkina Faso".
o Values in the field 'Prefix' MAY be added to records of type 3. Values in the field 'Prefix' MAY be added to records of type
'variant' via the registration process. 'variant' via the registration process.
o Values in the field 'Prefix' MAY be modified, so long as the 4. Values in the field 'Prefix' MAY be modified, so long as the
modifications broaden the set of prefixes. That is, a prefix MAY modifications broaden the set of prefixes. That is, a prefix
be replaced by one of its own prefixes. For example, the prefix MAY be replaced by one of its own prefixes. For example, the
"en-US" could be replaced by "en", but not by the prefixes "en- prefix "en-US" could be replaced by "en", but not by the
Latn", "fr", or "en-US-boont". If one of those prefixes were prefixes "en-Latn", "fr", or "en-US-boont". If one of those
needed, a new Prefix SHOULD be registered. prefixes were needed, a new Prefix SHOULD be registered.
o Values in the field 'Prefix' MUST NOT be removed. 5. Values in the field 'Prefix' MUST NOT be removed.
o The field 'Comments' MAY be added, changed, modified, or removed 6. The field 'Comments' MAY be added, changed, modified, or removed
via the registration process or any of the processes or via the registration process or any of the processes or
considerations described in this section. considerations described in this section.
o The field 'Suppress-Script' MAY be added or removed via the 7. The field 'Suppress-Script' MAY be added or removed via the
registration process. registration process.
o Codes assigned by ISO 639, ISO 15924, and ISO 3166 that do not 8. Codes assigned by ISO 639, ISO 15924, and ISO 3166 that do not
conflict with existing subtags of the associated type and whose conflict with existing subtags of the associated type and whose
meaning is not the same as an existing subtag of the same type are meaning is not the same as an existing subtag of the same type
entered into the IANA registry as new records. are entered into the IANA registry as new records.
o Codes assigned by ISO 639, ISO 15924, or ISO 3166 that are 9. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that are
withdrawn by their respective maintenance or registration withdrawn by their respective maintenance or registration
authority remain valid in language tags. A 'Deprecated' field authority remain valid in language tags. A 'Deprecated' field
containing the date of withdrawal is added to the record. If a containing the date of withdrawal is added to the record. If a
new record of the same type is added that represents a replacement new record of the same type is added that represents a
value, then a 'Preferred-Value' field MAY also be added. The replacement value, then a 'Preferred-Value' field MAY also be
registration process MAY be used to add comments about the added. The registration process MAY be used to add comments
withdrawal of the code by the respective standard. about the withdrawal of the code by the respective standard.
* The region code 'TL' was assigned to the country 'Timor-Leste', 1. The region code 'TL' was assigned to the country 'Timor-
replacing the code 'TP' (which was assigned to 'East Timor' Leste', replacing the code 'TP' (which was assigned to 'East
when it was under administration by Portugal). The subtag 'TP' Timor' when it was under administration by Portugal). The
remains valid in language tags, but its record contains the a subtag 'TP' remains valid in language tags, but its record
'Preferred-Value' of 'TL' and its field 'Deprecated' contains contains the a 'Preferred-Value' of 'TL' and its field
the date the new code was assigned ('2004-07-06'). 'Deprecated' contains the date the new code was assigned
('2004-07-06').
o Codes assigned by ISO 639, ISO 15924, or ISO 3166 that conflict 10. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that conflict
with existing subtags of the associated type, including subtags with existing subtags of the associated type, including subtags
that are deprecated, MUST NOT be entered into the registry. The that are deprecated, MUST NOT be entered into the registry. The
following additional considerations apply to subtag values that following additional considerations apply to subtag values that
are reassigned: are reassigned:
* For ISO 639 codes, if the newly assigned code's meaning is not A. For ISO 639 codes, if the newly assigned code's meaning is
represented by a subtag in the IANA registry, the Language not represented by a subtag in the IANA registry, the
Subtag Reviewer, as described in Section 3.4, SHALL prepare a Language Subtag Reviewer, as described in Section 3.4, SHALL
proposal for entering in the IANA registry as soon as practical prepare a proposal for entering in the IANA registry as soon
a registered language subtag as an alternate value for the new as practical a registered language subtag as an alternate
code. The form of the registered language subtag will be at value for the new code. The form of the registered language
the discretion of the Language Subtag Reviewer and MUST conform subtag will be at the discretion of the Language Subtag
to other restrictions on language subtags in this document. Reviewer and MUST conform to other restrictions on language
subtags in this document.
* For all subtags whose meaning is derived from an external B. For all subtags whose meaning is derived from an external
standard (i.e. ISO 639, ISO 15924, ISO 3166, or UN M.49), if a standard (i.e. ISO 639, ISO 15924, ISO 3166, or UN M.49),
new meaning is assigned to an existing code and the new meaning if a new meaning is assigned to an existing code and the new
broadens the meaning of that code, then the meaning for the meaning broadens the meaning of that code, then the meaning
associated subtag MAY be changed to match. The meaning of a for the associated subtag MAY be changed to match. The
subtag MUST NOT be narrowed, however, as this can result in an meaning of a subtag MUST NOT be narrowed, however, as this
unknown proportion of the existing uses of a subtag becoming can result in an unknown proportion of the existing uses of
invalid. Note: ISO 639 MA/RA has adopted a similar stability a subtag becoming invalid. Note: ISO 639 MA/RA has adopted
policy. a similar stability policy.
* For ISO 15924 codes, if the newly assigned code's meaning is C. For ISO 15924 codes, if the newly assigned code's meaning is
not represented by a subtag in the IANA registry, the Language not represented by a subtag in the IANA registry, the
Subtag Reviewer, as described in Section 3.4, SHALL prepare a Language Subtag Reviewer, as described in Section 3.4, SHALL
proposal for entering in the IANA registry as soon as practical prepare a proposal for entering in the IANA registry as soon
a registered variant subtag as an alternate value for the new as practical a registered variant subtag as an alternate
code. The form of the registered variant subtag will be at the value for the new code. The form of the registered variant
discretion of the Language Subtag Reviewer and MUST conform to subtag will be at the discretion of the Language Subtag
other restrictions on variant subtags in this document. Reviewer and MUST conform to other restrictions on variant
subtags in this document.
* For ISO 3166 codes, if the newly assigned code's meaning is D. For ISO 3166 codes, if the newly assigned code's meaning is
associated with the same UN M.49 code as another 'region' associated with the same UN M.49 code as another 'region'
subtag, then the existing region subtag remains as the subtag, then the existing region subtag remains as the
preferred value for that region and no new entry is created. A preferred value for that region and no new entry is created.
comment MAY be added to the existing region subtag indicating A comment MAY be added to the existing region subtag
the relationship to the new ISO 3166 code. indicating the relationship to the new ISO 3166 code.
* For ISO 3166 codes, if the newly assigned code's meaning is E. For ISO 3166 codes, if the newly assigned code's meaning is
associated with a UN M.49 code that is not represented by an associated with a UN M.49 code that is not represented by an
existing region subtag, then the Language Subtag Reviewer, as existing region subtag, then the Language Subtag Reviewer,
described in Section 3.4, SHALL prepare a proposal for entering as described in Section 3.4, SHALL prepare a proposal for
the appropriate UN M.49 country code as an entry in the IANA entering the appropriate UN M.49 country code as an entry in
registry. the IANA registry.
* Codes assigned by UN M.49 to countries or areas (as opposed to F. For ISO 3166 codes, if there is no associated UN numeric
geographical regions and sub-regions) for which there is no code, then the Language Subtag Reviewer SHALL petition the
corresponding ISO 3166 code MUST NOT be registered, except UN to create one. If there is no response from the UN
under the previous provision. If it is necessary to identify a within ninety days of the request being sent, the Language
region for which only a UN M.49 code exists in language tags, Subtag Reviewer SHALL prepare a proposal for entering in the
then the registration authority for ISO 3166 SHOULD be IANA registry as soon as practical a registered variant
petitioned to assign a code, which can then be registered for subtag as an alternate value for the new code. The form of
use in language tags. At the time this document was written, the registered variant subtag will be at the discretion of
there were only four such codes: 830 (Channel Islands), 831 the Language Subtag Reviewer and MUST conform to other
(Guernsey), 832 (Jersey), and 833 (Isle of Man). This rule restrictions on variant subtags in this document. This
exists so that UN M.49 codes remain available as the value of situation is very unlikely to ever occur.
last resort in cases where ISO 3166 reassigns a deprecated
value in the registry.
* For ISO 3166 codes, if there is no associated UN numeric code, 11. Codes assigned by UN M.49 to countries or areas (as opposed to
then the Language Subtag Reviewer SHALL petition the UN to geographical regions and sub-regions) for which there is no
create one. If there is no response from the UN within ninety corresponding ISO 3166 code MUST NOT be registered, except under
days of the request being sent, the Language Subtag Reviewer the previous provisions (as a surrogate for an ISO 3166 code
SHALL prepare a proposal for entering in the IANA registry as that cannot itself be registered). If it is necessary to
soon as practical a registered variant subtag as an alternate identify a region for which only a UN M.49 code exists in
value for the new code. The form of the registered variant language tags, then the registration authority for ISO 3166
subtag will be at the discretion of the Language Subtag SHOULD be petitioned to assign a code, which can then be
Reviewer and MUST conform to other restrictions on variant registered for use in language tags. At the time this document
subtags in this document. This situation is very unlikely to was written, there were only four such codes: 830 (Channel
ever occur. Islands), 831 (Guernsey), 832 (Jersey), and 833 (Isle of Man).
This rule exists so that UN M.49 codes remain available as the
value of last resort in cases where ISO 3166 reassigns a
deprecated value in the registry.
o Stability provisions apply to grandfathered tags with this 12. Stability provisions apply to grandfathered tags with this
exception: should all of the subtags in a grandfathered tag become exception: should all of the subtags in a grandfathered tag
valid subtags in the IANA registry, then the field 'Type' in that become valid subtags in the IANA registry, then the field 'Type'
record is changed from 'grandfathered' to 'redundant'. Note that in that record is changed from 'grandfathered' to 'redundant'.
this will not affect language tags that match the grandfathered Note that this will not affect language tags that match the
tag, since these tags will now match valid generative subtag grandfathered tag, since these tags will now match valid
sequences. For example, if the subtag 'gan' in the language tag generative subtag sequences. For example, if the subtag 'gan'
"zh-gan" were to be registered as an extended language subtag, in the language tag "zh-gan" were to be registered as an
then the grandfathered tag "zh-gan" would be deprecated (but extended language subtag, then the grandfathered tag "zh-gan"
existing content or implementations that use "zh-gan" would remain would be deprecated (but existing content or implementations
valid). that use "zh-gan" would remain valid).
3.4 Registration Procedure for Subtags 3.4 Registration Procedure for Subtags
The procedure given here MUST be used by anyone who wants to use a The procedure given here MUST be used by anyone who wants to use a
subtag not currently in the IANA Language Subtag Registry. subtag not currently in the IANA Language Subtag Registry.
Only subtags of type 'language' and 'variant' will be considered for Only subtags of type 'language' and 'variant' will be considered for
independent registration of new subtags. Handling of subtags needed independent registration of new subtags. Handling of subtags needed
for stability and subtags necessary to keep the registry synchronized for stability and subtags necessary to keep the registry synchronized
with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits
skipping to change at page 50, line 9 skipping to change at page 50, line 9
as used in Japanese, Chinese, and Korean). When language tags are as used in Japanese, Chinese, and Korean). When language tags are
applied to spans of text, rendering engines can use that information applied to spans of text, rendering engines can use that information
in deciding which font to use in the absence of other information, in deciding which font to use in the absence of other information,
particularly where languages with distinct writing traditions use the particularly where languages with distinct writing traditions use the
same characters. same characters.
8. Changes from RFC 3066 8. Changes from RFC 3066
The main goals for this revision of language tags were the following: The main goals for this revision of language tags were the following:
*Compatibility.* All valid RFC 3066 language tags (including those *Compatibility.* All RFC 3066 language tags (including those in the
in the IANA registry) remain valid in this specification. Thus IANA registry) remain valid in this specification. The changes in
there is complete backward compatibility of this specification with this document represent additional constraints on language tags.
existing content. In addition, this document defines language tags That is, in no case is the syntax more permissive and processors
in such as way as to ensure future compatibility, and processors based on the RFC 3066 ABNF (such as those described in [XMLSchema])
based solely on the RFC 3066 ABNF (such as those described in will be able to process the tags described by this document. In
[XMLSchema]) will be able to process tags described by this document. addition, this document defines language tags in such as way as to
ensure future compatibility.
*Stability.* Because of the changes in underlying ISO standards, a *Stability.* Because of changes in the past in the underlying ISO
valid RFC 3066 language tag may become invalid (or have its meaning standards, a valid RFC 3066 language tag could become invalid or have
change) at a later date. With so much of the world's computing its meaning change. This has the potential of invalidating content
infrastructure dependent on language tags, this is simply that may have an extensive shelf-life. In this specification, once a
unacceptable: it invalidates content that may have an extensive language tag is valid, it remains valid forever.
shelf-life. In this specification, once a language tag is valid, it
remains valid forever. Previously, there was no way to determine
when two tags were equivalent. This specification provides a stable
mechanism for doing so, through the use of canonical forms. These
are also stable, so that implementations can depend on the use of
canonical forms to assess equivalency.
*Validity.* The structure of language tags defined by this document *Validity.* The structure of language tags defined by this document
makes it possible to determine if a particular tag is well-formed makes it possible to determine if a particular tag is well-formed
without regard for the actual content or "meaning" of the tag as a without regard for the actual content or "meaning" of the tag as a
whole. This is important because the registry and underlying whole. This is important because the registry grows and underlying
standards change over time. In addition, it must be possible to standards change over time. In addition, it must be possible to
determine if a tag is valid (or not) for a given point in time in determine if a tag is valid (or not) for a given point in time in
order to provide reproducible, testable results. This process must order to provide reproducible, testable results. This process must
not be error-prone; otherwise even intelligent people will generate not be error-prone; otherwise implementations might give different
implementations that give different results. This specification results. By having an authoritative registry with specific
provides for that by having a single data file, with specific versioning information, the validity of language tags at any point in
versioning information, so that the validity of language tags at any time can be precisely determined (instead of interpolating values
point in time can be precisely determined (instead of interpolating from many separate sources).
values from many separate sources).
*Extensibility.* It is important to be able to differentiate between *Utility.* It is sometimes important to be able to differentiate
written forms of language -- for many implementations this is more between written forms of a language -- for many implementations this
important than distinguishing between spoken variants of a language. is more important than distinguishing between the spoken variants of
Languages are written in a wide variety of different scripts, so this a language. Languages are written in a wide variety of different
document provides for the generative use of ISO 15924 script codes. scripts, so this document provides for the generative use of ISO
Like the generative use of ISO language and country codes in RFC 15924 script codes. Like the generative use of ISO language and
3066, this allows combinations to be produced without resorting to country codes in RFC 3066, this allows combinations to be produced
the registration process. The addition of UN codes provides for the without resorting to the registration process. The addition of UN
generation of language tags with regional scope, which is also M.49 codes provides for the generation of language tags with regional
required for information technology. scope, which is also required by some applications.
The recast of the registry from containing whole language tags to The recast of the registry from containing whole language tags to
subtags is a key part of this. An important feature of RFC 3066 was subtags is a key part of this. An important feature of RFC 3066 was
that it allowed generative use of subtags. This allows people to that it allowed generative use of subtags. This allows people to
meaningfully use generated tags, without the delays in registering meaningfully use generated tags, without the delays in registering
whole tags, and the burden on the registry of having to supply all of whole tags or the need to register all of the combinations that might
the combinations that people may find useful. be useful.
Because of the widespread use of language tags, it is potentially The choice of placing the extended language and script subtags
disruptive to have periodic revisions of the core specification, between the primary language and region subtags was widely debated.
despite demonstrated need. The extension mechanism provides for a This design was chosen because the prevalent matching and content
way for independent RFCs to define extensions to language tags. negotiation schemes rely on the subtags being arranged in order of
These extensions have a very constrained, well-defined structure to increasing specificity. That is, the subtags that mark a greater
prevent extensions from interfering with implementations of language barrier to mutual intelligibility appear left-most in a tag. For
tags defined in this document. The document also anticipates example, when selecting content written in Azerbaijani, the script
features of ISO 639-3 with the addition of the extended language (Arabic, Cyrillic, or Latin) represents a greater barrier to
subtags, as well as the possibility of other ISO 639 parts becoming understanding than any regional variations (those associated with
useful for the formation of language tags in the future. The use and Azerbaijan or Iran, for example). Individuals who prefer documents
definition of private use tags has also been modified, to allow in a particular script, but can deal with the minor regional
people to move as much information as possible out of private use differences, can therefore select appropriate content. Applications
tags, and into the regular structure. The goal is to dramatically that do not deal with written content will continue to omit these
reduce the need to produce a revision of this document in the future. subtags.
*Extensibility.* Because of the widespread use of language tags, it
is disruptive to have periodic revisions of the core specification,
even in the face of demonstrated need. The extension mechanism
provides for a way for independent RFCs to define extensions to
language tags. These extensions have a very constrained, well-
defined structure that prevent extensions from interfering with
implementations of language tags defined in this document.
The document also anticipates features of ISO 639-3 with the addition
of the extended language subtags, as well as the possibility of other
ISO 639 parts becoming useful for the formation of language tags in
the future.
The use and definition of private use tags has also been modified, to
allow people to use private use subtags to extend or modify defined
tags and to move as much information as possible out of private use
and into the regular structure.
The goal for each of these modifications is to reduce or eliminate
the need for future revisions of this document.
The specific changes in this document to meet these goals are: The specific changes in this document to meet these goals are:
o Defines the ABNF and rules for subtags so that the category of all o Defines the ABNF and rules for subtags so that the category of all
subtags can be determined without reference to the registry. subtags can be determined without reference to the registry.
o Adds the concept of well-formed vs. validating processors, o Adds the concept of well-formed vs. validating processors,
defining the rules by which an implementation can claim to be one defining the rules by which an implementation can claim to be one
or the other. or the other.
skipping to change at page 52, line 21 skipping to change at page 52, line 39
region subtags respectively. region subtags respectively.
o Adds a well-defined extension mechanism. o Adds a well-defined extension mechanism.
o Defines an extended language subtag, possibly for use with certain o Defines an extended language subtag, possibly for use with certain
anticipated features of ISO 639-3. anticipated features of ISO 639-3.
Ed Note: The following items are provided for the convenience of Ed Note: The following items are provided for the convenience of
reviewers and will be removed from the final document. reviewers and will be removed from the final document.
Changes between draft-ietf-ltru-registry-06 and this version are: Changes between draft-ietf-ltru-registry-07 and this version are:
o Modified the rules for creating the initial-registry draft to
require purposefully omitted by eligible codes to be listed
(#1034)(R.Presuhn)
o Removed the example registry. The initial-registry draft is a
better example. Added an informative reference to that document.
(A.Phillips)
o Modified the introduction to Section 2.2.4 and changed the use of
"as used in" for some examples to clarify how UN M.49 codes and
other larger regional codes are related to language tags.
(K.Broome, P.Constable)
o Removed nearly all of the text from Section 3.7 to [initial- o Removed the reference to RFC 3066 and RFC 1766 from the abstract.
registry]. A bit of new glue text pointing to that document was (F.Ellermann)
added. (F.Ellermann)
o Updated the Section 5 section to reflect the removal of most of o Minor tweaking of the text in Section 2.1. (A.Phillips)
the text in Section 3.7 and to generally clean it up. This
includes breaking it into subsections. (A.Phillips)
9. References 9. References
9.1 Normative References 9.1 Normative References
[ISO639-1] [ISO639-1]
International Organization for Standardization, "ISO 639- International Organization for Standardization, "ISO 639-
1:2002, Codes for the representation of names of languages 1:2002, Codes for the representation of names of languages
-- Part 1: Alpha-2 code", ISO Standard 639, 2002, <ISO -- Part 1: Alpha-2 code", ISO Standard 639, 2002, <ISO
639-1>. 639-1>.
 End of changes. 

This html diff was produced by rfcdiff 1.24, available from http://www.levkowetz.com/ietf/tools/rfcdiff/