draft-ietf-ltru-matching-12.txt   draft-ietf-ltru-matching-13.txt 
Network Working Group A. Phillips, Ed. Network Working Group A. Phillips, Ed.
Internet-Draft Yahoo! Inc. Internet-Draft Yahoo! Inc.
Obsoletes: 3066 (if approved) M. Davis, Ed. Obsoletes: 3066 (if approved) M. Davis, Ed.
Expires: October 8, 2006 Google Expires: November 19, 2006 Google
April 6, 2006 May 18, 2006
Matching of Language Tags Matching of Language Tags
draft-ietf-ltru-matching-12 draft-ietf-ltru-matching-13
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on October 8, 2006. This Internet-Draft will expire on November 19, 2006.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2006). Copyright (C) The Internet Society (2006).
Abstract Abstract
This document describes different mechanisms for comparing and This document describes a syntax, called a "language-range", for
matching language tags. Possible algorithms for language negotiation specifying items in a user's language preferences, called a "language
or content selection, filtering, and lookup are described. This priority list". It also describes different mechanisms for comparing
document, in combination with RFC 3066bis (Ed.: replace "3066bis" and matching these to language tags. Two kinds of matching
with the RFC number assigned to draft-ietf-ltru-registry-14), mechanisms, filtering and lookup, are defined. Filtering produces a
replaces RFC 3066, which replaced RFC 1766. (potentially empty) set of language tags, whereas lookup produces a
single language tag. Possible applications include language
negotiation or content selection. This document, in combination with
RFC 3066bis (Ed.: replace "3066bis" with the RFC number assigned to
draft-ietf-ltru-registry-14), replaces RFC 3066, which replaced RFC
1766.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. The Language Range . . . . . . . . . . . . . . . . . . . . . . 4 2. The Language Range . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Basic Language Range . . . . . . . . . . . . . . . . . . . 4 2.1. Basic Language Range . . . . . . . . . . . . . . . . . . . 4
2.2. Extended Language Range . . . . . . . . . . . . . . . . . 5 2.2. Extended Language Range . . . . . . . . . . . . . . . . . 5
2.3. The Language Priority List . . . . . . . . . . . . . . . . 5 2.3. The Language Priority List . . . . . . . . . . . . . . . . 5
3. Types of Matching . . . . . . . . . . . . . . . . . . . . . . 7 3. Types of Matching . . . . . . . . . . . . . . . . . . . . . . 7
3.1. Choosing a Type of Matching . . . . . . . . . . . . . . . 7 3.1. Choosing a Matching Scheme . . . . . . . . . . . . . . . . 7
3.2. Implementation Considerations . . . . . . . . . . . . . . 8 3.2. Implementation Considerations . . . . . . . . . . . . . . 8
3.3. Filtering . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3. Filtering . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.1. Basic Filtering . . . . . . . . . . . . . . . . . . . 10 3.3.1. Basic Filtering . . . . . . . . . . . . . . . . . . . 10
3.3.2. Extended Filtering . . . . . . . . . . . . . . . . . . 10 3.3.2. Extended Filtering . . . . . . . . . . . . . . . . . . 11
3.4. Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.4. Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4.1. Default Values . . . . . . . . . . . . . . . . . . . . 14 3.4.1. Default Values . . . . . . . . . . . . . . . . . . . . 14
4. Other Considerations . . . . . . . . . . . . . . . . . . . . . 16 4. Other Considerations . . . . . . . . . . . . . . . . . . . . . 16
4.1. Choosing Language Ranges . . . . . . . . . . . . . . . . . 16 4.1. Choosing Language Ranges . . . . . . . . . . . . . . . . . 16
4.2. Meaning of Language Tags and Ranges . . . . . . . . . . . 17 4.2. Meaning of Language Tags and Ranges . . . . . . . . . . . 17
4.3. Considerations for Private Use Subtags . . . . . . . . . . 17 4.3. Considerations for Private Use Subtags . . . . . . . . . . 17
4.4. Length Considerations for Language Ranges . . . . . . . . 18 4.4. Length Considerations for Language Ranges . . . . . . . . 18
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19
6. Security Considerations . . . . . . . . . . . . . . . . . . . 20 6. Security Considerations . . . . . . . . . . . . . . . . . . . 20
7. Character Set Considerations . . . . . . . . . . . . . . . . . 21 7. Character Set Considerations . . . . . . . . . . . . . . . . . 21
skipping to change at page 3, line 9 skipping to change at page 3, line 9
8.1. Normative References . . . . . . . . . . . . . . . . . . . 22 8.1. Normative References . . . . . . . . . . . . . . . . . . . 22
8.2. Informative References . . . . . . . . . . . . . . . . . . 22 8.2. Informative References . . . . . . . . . . . . . . . . . . 22
Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 23 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 23
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24
Intellectual Property and Copyright Statements . . . . . . . . . . 25 Intellectual Property and Copyright Statements . . . . . . . . . . 25
1. Introduction 1. Introduction
Human beings on our planet have, past and present, used a number of Human beings on our planet have, past and present, used a number of
languages. There are many reasons why one would want to identify the languages. There are many reasons why one would want to identify the
language used when presenting or requesting information or in some language used when presenting or requesting information.
specific set of information items.
Applications, protocols, or specifications that use language Applications, protocols, or specifications that use language
identifiers, such as the language tags defined in [RFC3066bis], identifiers, such as the language tags defined in [RFC3066bis],
sometimes need to match language tags to a user's language sometimes need to match language tags to a user's language
preferences. preferences.
This document defines a syntax (called a language range (Section 2)) This document defines a syntax (called a language range (Section 2))
for specifying items in the user's list of language preferences for specifying items in the user's list of language preferences
(called a language priority list (Section 2.3)), as well as several (called a language priority list (Section 2.3)), as well as several
schemes for selecting or filtering sets of language tags by comparing schemes for selecting or filtering sets of language tags by comparing
skipping to change at page 4, line 39 skipping to change at page 4, line 39
wildcards vary according to the type of language range. wildcards vary according to the type of language range.
Language tags and thus language ranges are to be treated as case- Language tags and thus language ranges are to be treated as case-
insensitive: there exist conventions for the capitalization of some insensitive: there exist conventions for the capitalization of some
of the subtags, but these MUST NOT be taken to carry meaning. of the subtags, but these MUST NOT be taken to carry meaning.
Matching of language tags to language ranges MUST be done in a case- Matching of language tags to language ranges MUST be done in a case-
insensitive manner. insensitive manner.
2.1. Basic Language Range 2.1. Basic Language Range
A "basic language range" consists of a sequence of alphanumeric A "basic language range" has the same syntax as an [RFC3066] language
subtags separated by hyphens. It is defined by the following ABNF tag or is the single character "*". The basic language range was
[RFC4234]: originally described by HTTP/1.1 [RFC2616] and later [RFC3066]. It
is defined by the following ABNF [RFC4234]:
language-range = (1*8ALPHA *("-" 1*8alphanum)) / "*" language-range = (1*8ALPHA *("-" 1*8alphanum)) / "*"
alphanum = ALPHA / DIGIT alphanum = ALPHA / DIGIT
Basic language ranges (originally described by HTTP/1.1 [RFC2616] and A basic language range differs from the language tags defined in
later [RFC3066]) have the same syntax as an [RFC3066] language tag or [RFC3066bis] only in that there is no requirement that they be "well-
are the single character "*". They differ from the language tags formed" or be validated against the IANA Language Subtag Registry.
defined in [RFC3066bis] only in that there is no requirement that Such ill-formed ranges will probably not match anything. Note that
they be "well-formed" or be validated against the IANA Language the ABNF [RFC4234] in [RFC2616] is incorrect, since it disallows the
Subtag Registry. Such ill-formed ranges will probably not match use of digits anywhere in the 'language-range' (see:
anything. Note that the ABNF [RFC4234] in [RFC2616] is incorrect,
since it disallows the use of digits anywhere in the 'language-range' [RFC2616errata]).
(see: [RFC2616errata]).
2.2. Extended Language Range 2.2. Extended Language Range
Occasionally users will wish to select a set of language tags based Occasionally users will wish to select a set of language tags based
on the presence of specific subtags. An "extended language range" on the presence of specific subtags. An "extended language range"
describes a user's language preference as an ordered sequence of describes a user's language preference as an ordered sequence of
subtags. For example, a user might wish to select all language tags subtags. For example, a user might wish to select all language tags
that contain the region subtag 'CH' (Switzerland). Extended language that contain the region subtag 'CH' (Switzerland). Extended language
ranges are useful in specifying a particular sequence of subtags that ranges are useful for specifying a particular sequence of subtags
appear in the set of matching tags without having to specify all of that appear in the set of matching tags without having to specify all
the intervening subtags. of the intervening subtags.
An extended language range can be represented by the following ABNF: An extended language range can be represented by the following ABNF:
extended-language-range = (1*8ALPHA / "*") extended-language-range = (1*8ALPHA / "*")
*("-" (1*8alphanum / "*")) *("-" (1*8alphanum / "*"))
Figure 2: Extended Language Range Figure 2: Extended Language Range
The wildcard subtag '*' can occur in any position in the extended The wildcard subtag '*' can occur in any position in the extended
language range, where it matches any sequence of subtags that might language range, where it matches any sequence of subtags that might
skipping to change at page 5, line 39 skipping to change at page 5, line 39
3.2.2). The use or absence of one or more wildcards cannot be taken 3.2.2). The use or absence of one or more wildcards cannot be taken
to imply that a certain number of subtags will appear in the matching to imply that a certain number of subtags will appear in the matching
set of language tags. set of language tags.
2.3. The Language Priority List 2.3. The Language Priority List
A user's language preferences will often need to specify more than A user's language preferences will often need to specify more than
one language range and thus users often need to specify a prioritized one language range and thus users often need to specify a prioritized
list of language ranges in order to best reflect their language list of language ranges in order to best reflect their language
preferences. This is especially true for speakers of minority preferences. This is especially true for speakers of minority
languages. A speaker of Breton in France, for example, may specify languages. A speaker of Breton in France, for example, can specify
"br" followed by "fr", meaning that if Breton is available, it is "br" followed by "fr", meaning that if Breton is available, it is
preferred, but otherwise French is the best alternative. It can get preferred, but otherwise French is the best alternative. It can get
more complex: a user may wish to fall back from Skolt Sami to more complex: a different user might want to fall back from Skolt
Northern Sami to Finnish. Sami to Northern Sami to Finnish.
A "language priority list" is a prioritized or weighted list of A "language priority list" is a prioritized or weighted list of
language ranges. One well known example of such a list is the language ranges. One well known example of such a list is the
"Accept-Language" header defined in RFC 2616 [RFC2616] (see Section "Accept-Language" header defined in RFC 2616 [RFC2616] (see Section
14.4) and RFC 3282 [RFC3282]. 14.4) and RFC 3282 [RFC3282].
The various matching operations described in this document include The various matching operations described in this document include
considerations for using a language priority list. This document considerations for using a language priority list. This document
does not define the syntax for a language priority list; defining does not define the syntax for a language priority list; defining
such a syntax is the responsibility of the protocol, application, or such a syntax is the responsibility of the protocol, application, or
specification that uses it. When given as examples in this document, specification that uses it. When given as examples in this document,
language priority lists will be shown as a quoted sequence of ranges language priority lists will be shown as a quoted sequence of ranges
separated by commas, like this: "en, fr, zh-Hant" (which would be separated by commas, like this: "en, fr, zh-Hant" (which is read
read as "English before French before Chinese as written in the "English before French before Chinese as written in the Traditional
Traditional script"). script").
A simple list of ranges is considered to be in descending order of A simple list of ranges is considered to be in descending order of
priority. Other language priority lists provide "quality weights" priority. Other language priority lists provide "quality weights"
for the language ranges in order to specify the relative priority of for the language ranges in order to specify the relative priority of
the user's language preferences. An example of this would be the use the user's language preferences. An example of this is the use of
of "q" values in the syntax of the "Accept-Language" header (defined "q" values in the syntax of the "Accept-Language" header (defined in
in [RFC2616], Section 14.4, and [RFC3282]). [RFC2616], Section 14.4, and [RFC3282]).
3. Types of Matching 3. Types of Matching
Matching language ranges to language tags can be done in many Matching language ranges to language tags can be done in many
different ways. This section describes three such matching schemes, different ways. This section describes three such matching schemes,
as well as the considerations for choosing between them. Protocols as well as the considerations for choosing between them. Protocols
and specifications requiring conformance to this specification MUST and specifications requiring conformance to this specification MUST
clearly indicate the particular mechanism used in selecting or clearly indicate the particular mechanism used in selecting or
matching language tags. matching language tags.
There are two types of matching scheme in this document. A matching There are two types of matching scheme in this document. A matching
scheme that produces zero or more matching language tags is called scheme that produces zero or more matching language tags is called
"filtering". A matching scheme that produces exactly one match for a "filtering". A matching scheme that produces exactly one match for a
given request is called "lookup". given request is called "lookup".
3.1. Choosing a Type of Matching 3.1. Choosing a Matching Scheme
Applications, protocols, and specifications are faced with the Applications, protocols, and specifications are faced with the
decision of what type of matching to use. Sometimes, different decision of what type of matching to use. Sometimes, different
styles of matching are suited to different kinds of processing within styles of matching are suited to different kinds of processing within
a particular application or protocol. a particular application or protocol.
This document describes three types of matching: This document describes three matching schemes:
1. Basic Filtering (Section 3.3.1) matches a language priority list 1. Basic Filtering (Section 3.3.1) matches a language priority list
consisting of basic language ranges (Section 2.1) to sets of consisting of basic language ranges (Section 2.1) to sets of
language tags. language tags.
2. Extended Filtering (Section 3.3.2) matches a language priority 2. Extended Filtering (Section 3.3.2) matches a language priority
list consisting of extended language ranges (Section 2.2) to sets list consisting of extended language ranges (Section 2.2) to sets
of language tags. of language tags.
3. Lookup (Section 3.4) matches a language priority list consisting 3. Lookup (Section 3.4) matches a language priority list consisting
of basic language ranges to sets of language tags to find the one of basic language ranges to sets of language tags to find the one
_exact_ language tag that best matches the range. _exact_ language tag that best matches the range.
Filtering can be used to produce a set of results (such as a Filtering can be used to produce a set of results (such as a
collection of documents) by comparing the user's preferences to a set collection of documents) by comparing the user's preferences to a set
of language tags. For example, when performing a search, one might of language tags. For example, when performing a search, filtering
use filtering to limit the results to items tagged as being in the can be used to limit the results to items tagged as being in the
French language. Filtering can also be used when deciding whether to French language. Filtering can also be used when deciding whether to
perform a language-sensitive process on some content. For example, a perform a language-sensitive process on some content. For example, a
process might cause paragraphs whose language tag matched the process might cause paragraphs whose language tag matched the
language range "nl" to be displayed in italics within a document. language range "nl" (Dutch) to be displayed in italics within a
document.
Lookup produces the single result that best matches the user's Lookup produces the single result that best matches the user's
preferences from the list of available tags, so it is useful in cases preferences from the list of available tags, so it is useful in cases
in which a single item is required (and for which only a single item in which a single item is required (and for which only a single item
can be returned). For example, if a process were to insert a human can be returned). For example, if a process were to insert a human
readable error message into a protocol header, it might select the readable error message into a protocol header, it might select the
text based on the user's language priority list. Since the process text based on the user's language priority list. Since the process
can return only one item, it must choose a single item and it must can return only one item, it is forced to choose a single item and it
return some item, even if none of the content's language tags match has to return some item, even if none of the content's language tags
the language priority list supplied by the user. match the language priority list supplied by the user.
3.2. Implementation Considerations 3.2. Implementation Considerations
Language tag matching is a tool, and does not by itself specify a Language tag matching is a tool, and does not by itself specify a
complete procedure for the use of language tags. Such procedures are complete procedure for the use of language tags. Such procedures are
intimately tied to the application protocol in which they occur. intimately tied to the application protocol in which they occur.
When specifying a protocol operation using matching, the protocol When specifying a protocol operation using matching, the protocol
MUST specify: MUST specify:
o Which type(s) of language tag matching it uses o Which type(s) of language tag matching it uses
skipping to change at page 8, line 49 skipping to change at page 8, line 50
to map grandfathered and obsolete tags or subtags into modern to map grandfathered and obsolete tags or subtags into modern
equivalents. equivalents.
Applications, protocols, or specifications that canonicalize ranges Applications, protocols, or specifications that canonicalize ranges
MUST either perform matching operations with both the canonical and MUST either perform matching operations with both the canonical and
original (unmodified) form of the range or MUST also canonicalize original (unmodified) form of the range or MUST also canonicalize
each tag for the purposes of comparison. each tag for the purposes of comparison.
Note that canonicalizing language ranges makes certain operations Note that canonicalizing language ranges makes certain operations
impossible. For example, an implementation that canonicalizes the impossible. For example, an implementation that canonicalizes the
language range "art-lojban" to use the more modern "jbo" cannot be language range "art-lojban" (artificial language, lojban variant) to
used to select just the items with the older tag. use the more modern "jbo" (Lojban) cannot be used to select just the
items with the older tag.
Applications, protocols, or specifications that use basic ranges Applications, protocols, or specifications that use basic ranges
might sometimes receive extended language ranges instead. An might sometimes receive extended language ranges instead. An
application, protocol, or specification MUST choose to: a) map application, protocol, or specification MUST choose to: a) map
extended language ranges to basic ranges using the algorithm below, extended language ranges to basic ranges using the algorithm below,
b) reject any extended language ranges in the language priority list b) reject any extended language ranges in the language priority list
that are not valid basic language ranges, or c) treat each extended that are not valid basic language ranges, or c) treat each extended
language range as if it were a basic language range, which will have language range as if it were a basic language range, which will have
the same result as ignoring them, since these ranges will won't match the same result as ignoring them, since these ranges will not match
any valid language tags. any valid language tags.
An extended language range is mapped to a basic language range as An extended language range is mapped to a basic language range as
follows: if the first subtag is a '*' then the entire range is follows: if the first subtag is a '*' then the entire range is
treated as "*", otherwise each wildcard subtag is removed. For treated as "*", otherwise each wildcard subtag is removed. For
example, if the language range were "en-*-US", then the range would example, the extended language range "en-*-US" maps to "en-US"
be mapped to "en-US". (English, United States).
Applications, protocols, or specifications, in addressing their Applications, protocols, or specifications, in addressing their
particular requirements, can offer pre-processing or configuration particular requirements, can offer pre-processing or configuration
options. For example, an implementation could allow a user to options. For example, an implementation could allow a user to
associate or map a particular language range to a different value. associate or map a particular language range to a different value.
Such a user might wish to associate the language range subtags 'nn' Such a user might wish to associate the language range subtags 'nn'
(Nynorsk Norwegian) and 'nb' (Bokmal Norwegian) with the more general (Nynorsk Norwegian) and 'nb' (Bokmal Norwegian) with the more general
subtag 'no' (Norwegian). Or perhaps the user could associate the subtag 'no' (Norwegian). Or perhaps a user would want to associate
range "zh-Hans" (Chinese as written in the Simplified script) with requests for the range "zh-Hans" (Chinese as written in the
the language tag "zh-CN" (Chinese as used in China, where the Simplified script) with content bearing the language tag "zh-CN"
Simplified script is predominant) because content is available with (Chinese as used in China, where the Simplified script is
that tag. Documentation on how the ranges or tags are altered, predominant). Documentation on how the ranges or tags are altered,
prioritized, or compared in the subsequent match in such an prioritized, or compared in the subsequent match in such an
implementation will assist users in making the best configuration implementation will assist users in making these types of
choices. configuration choices.
3.3. Filtering 3.3. Filtering
Filtering is used to select the set of language tags that matches a Filtering is used to select the set of language tags that matches a
given language priority list. It is called "filtering" because this given language priority list. It is called "filtering" because this
set might contain no items at all or it might return an arbitrarily set might contain no items at all or it might return an arbitrarily
large number of matching items: as many items as match the language large number of matching items: as many items as match the language
priority list, thus "filtering out" the non-matching items. priority list, thus "filtering out" the non-matching items.
In filtering, each language range represents the _least_ specific In filtering, each language range represents the _least_ specific
language tag (that is, the language tag with fewest number of language tag (that is, the language tag with fewest number of
subtags) which is an acceptable match. All of the language tags in subtags) which is an acceptable match. All of the language tags in
the matching set of tags will have an equal or greater number of the matching set of tags will have an equal or greater number of
subtags than the language range. Every non-wildcard subtag in the subtags than the language range. Every non-wildcard subtag in the
language range will appear in every one of the matching language language range will appear in every one of the matching language
tags. For example, if the language priority list consists of the tags. For example, if the language priority list consists of the
range "de-CH", one might see tags such as "de-CH-1996" but one will range "de-CH" (German as used in Switzerland), one might see tags
never see a tag such as "de" (because the 'CH' subtag is missing). such as "de-CH-1996" (German as used in Switzerland, orthography of
1996) but one will never see a tag such as "de" (because the 'CH'
subtag is missing).
If the language priority list (see Section 2.3) contains more than If the language priority list (see Section 2.3) contains more than
one range, the content returned is typically ordered in descending one range, the content returned is typically ordered in descending
level of preference, but it MAY be unordered, according to the needs level of preference, but it MAY be unordered, according to the needs
of the application or protocol. of the application or protocol.
Some examples of applications where filtering might be appropriate Some examples of applications where filtering might be appropriate
include: include:
o Applying a style to sections of a document in a particular set of o Applying a style to sections of a document in a particular set of
skipping to change at page 10, line 25 skipping to change at page 10, line 29
o Displaying the set of documents containing a particular set of o Displaying the set of documents containing a particular set of
keywords written in a specific set of languages. keywords written in a specific set of languages.
o Selecting all email items written in a specific set of languages. o Selecting all email items written in a specific set of languages.
o Selecting audio files spoken in a particular language. o Selecting audio files spoken in a particular language.
Filtering seems to imply that there is a semantic relationship Filtering seems to imply that there is a semantic relationship
between language tags that share the same prefix. While this is between language tags that share the same prefix. While this is
often the case, it is not always true and users should note that the often the case, it is not always true: the language tags that match a
set of language tags that match a specific language range do not specific language range do not necessarily represent mutually
necessarily represent mutually intelligible languages. intelligible languages.
3.3.1. Basic Filtering 3.3.1. Basic Filtering
Basic filtering uses basic language ranges. Each basic language Basic filtering compares basic language ranges to language tags.
range in the language priority list is considered in turn, according Each basic language range in the language priority list is considered
to priority. A language range matches a particular language tag if, in turn, according to priority. A language range matches a
in a case-insensitive comparison, it exactly equals the tag, or if it particular language tag if, in a case-insensitive comparison, it
exactly equals a prefix of the tag such that the first character exactly equals the tag, or if it exactly equals a prefix of the tag
following the prefix is "-". For example, the language-range "de-de" such that the first character following the prefix is "-". For
matches the language tag "de-DE-1996", but not the language tags "de- example, the language-range "de-de" (German as used in German)
Deva" or "de-Latn-DE". matches the language tag "de-DE-1996" (German as used in Germany,
orthography of 1996), but not the language tags "de-Deva" (German as
written in the Devanagari script) or "de-Latn-DE" (German, Latin
script, as used in Germany).
The special range "*" in a language priority list matches any tag. A The special range "*" in a language priority list matches any tag. A
protocol which uses language ranges MAY specify additional rules protocol which uses language ranges MAY specify additional rules
about the semantics of "*"; for instance, HTTP/1.1 [RFC2616] about the semantics of "*"; for instance, HTTP/1.1 [RFC2616]
specifies that the range "*" matches only languages not matched by specifies that the range "*" matches only languages not matched by
any other range within an "Accept-Language" header. any other range within an "Accept-Language" header.
Basic filtering is identical to the type of matching described in Basic filtering is identical to the type of matching described in
[RFC3066], Section 2.5 (Language-range). [RFC3066], Section 2.5 (Language-range).
skipping to change at page 12, line 5 skipping to change at page 12, line 10
Subtags not specified, including those at the end of the language Subtags not specified, including those at the end of the language
range, are thus treated as if assigned the wildcard value '*'. Much range, are thus treated as if assigned the wildcard value '*'. Much
like basic filtering, extended filtering selects content with like basic filtering, extended filtering selects content with
arbitrarily long tags that share the same initial subtags as the arbitrarily long tags that share the same initial subtags as the
language range. In addition, extended filtering selects language language range. In addition, extended filtering selects language
tags that contain any intermediate subtags not specified in the tags that contain any intermediate subtags not specified in the
language range. For example, the extended language range "de-*-DE" language range. For example, the extended language range "de-*-DE"
(or its synonym "de-DE") matches all of the following tags: (or its synonym "de-DE") matches all of the following tags:
de-DE de-DE (German, as used in Germany)
de-Latn-DE de-de (German, as used in Germany)
de-Latf-DE de-Latn-DE (Latin script)
de-de de-Latf-DE (Fraktur variant of Latin script)
de-DE-x-goethe de-DE-x-goethe (private use subtag)
de-Latn-DE-1996 de-Latn-DE-1996 (orthography of 1996)
de-Deva-DE de-Deva-DE (Devanagari script)
The same range does not match any of the following tags for the The same range does not match any of the following tags for the
reasons shown: reasons shown:
de (missing 'DE') de (missing 'DE')
de-x-DE (singleton 'x' occurs before 'DE') de-x-DE (singleton 'x' occurs before 'DE')
de-Deva ('Deva' not equal to 'DE') de-Deva ('Deva' not equal to 'DE')
Note: [RFC3066bis] defines each type of subtag (language, script, Note: [RFC3066bis] defines each type of subtag (language, script,
region, and so forth) according to position, size, and content. This region, and so forth) according to position, size, and content. This
means that subtags in a language range can only match specific types means that subtags in a language range can only match specific types
of subtags in a language tag. For example, a subtag such as 'Latn' of subtags in a language tag. For example, a subtag such as 'Latn'
is always a script subtag (unless it follows a singleton) while a is always a script subtag (unless it follows a singleton) while a
subtag such as 'nedis' can only match the equivalent variant subtag. subtag such as 'nedis' can only match the equivalent variant subtag.
One such difference is that two-letter subtags in initial position Two-letter subtags in initial position have a different type
have a different type (language) than two-letter subtags in later (language) than two-letter subtags in later positions (region). This
positions (region). This is the reason why a wildcard in the is the reason why a wildcard in the extended language range is
extended language range is significant in the first position and significant in the first position but is ignored in all other
subsequently ignored. positions.
3.4. Lookup 3.4. Lookup
Lookup is used to select the single language tag that best matches Lookup is used to select the single language tag that best matches
the language priority list for a given request. When performing the language priority list for a given request. When performing
lookup, each language range in the language priority list is lookup, each language range in the language priority list is
considered in turn, according to priority. By contrast with considered in turn, according to priority. By contrast with
filtering, each language range represents the _most_ specific tag filtering, each language range represents the _most_ specific tag
which is an acceptable match. The first matching tag found, which is an acceptable match. The first matching tag found,
according to the user's priority, is considered the closest match and according to the user's priority, is considered the closest match and
is the item returned. For example, if the language range is "de-ch", is the item returned. For example, if the language range is "de-ch",
a lookup operation can produce content with the tags "de" or "de-CH" a lookup operation can produce content with the tags "de" or "de-CH"
but never content with the tag "de-CH-1996". If no language tag but never content with the tag "de-CH-1996". If no language tag
matches the request, the "default" value is returned. matches the request, the "default" value is returned.
For example, if an application inserts some dynamic content into a For example, if an application inserts some dynamic content into a
document, returning an empty string if there is no exact match is not document, returning an empty string if there is no exact match is not
an option. Instead, the application "falls back" until it finds a an option. Instead, the application "falls back" until it finds a
matching language tag associated with a suitable piece of content to matching language tag associated with a suitable piece of content to
insert. Examples of lookup might include: insert. Some applications of lookup include:
o Selection of a template containing the text for an automated email o Selection of a template containing the text for an automated email
response. response.
o Selection of a item containing some text for inclusion in a o Selection of a item containing some text for inclusion in a
particular Web page. particular Web page.
o Selection of a string of text for inclusion in an error log. o Selection of a string of text for inclusion in an error log.
o Selection of an audio file to play as a prompt in a phone system. o Selection of an audio file to play as a prompt in a phone system.
In the lookup scheme, the language range is progressively truncated In the lookup scheme, the language range is progressively truncated
from the end until a matching language tag is located. Single letter from the end until a matching language tag is located. Single letter
or digit subtags (including both the letter 'x' which introduces or digit subtags (including both the letter 'x' which introduces
private-use sequences, and the subtags that introduce extensions) are private-use sequences, and the subtags that introduce extensions) are
removed at the same time as their closest trailing subtag. For removed at the same time as their closest trailing subtag. For
example, starting with the range "zh-Hant-CN-x-private1-private2", example, starting with the range "zh-Hant-CN-x-private1-private2"
the lookup progressively searches for content as shown below: (Chinese, Traditional script, China, two private use tags) the lookup
progressively searches for content as shown below:
Range to match: zh-Hant-CN-x-private1-private2 Range to match: zh-Hant-CN-x-private1-private2
1. zh-Hant-CN-x-private1-private2 1. zh-Hant-CN-x-private1-private2
2. zh-Hant-CN-x-private1 2. zh-Hant-CN-x-private1
3. zh-Hant-CN 3. zh-Hant-CN
4. zh-Hant 4. zh-Hant
5. zh 5. zh
6. (default) 6. (default)
Figure 3: Example of a Lookup Fallback Pattern Figure 3: Example of a Lookup Fallback Pattern
This allows some flexibility in finding a match. For example, lookup This fallback behavior allows some flexibility in finding a match.
provides better results for cases in which content is not available Without fallback, the default content would be returned immediately
that exactly matches the user request than if the default language if exactly matching content is unavailable. With fallback, a result
for the system or content were returned immediately. Language more closely matching the user request can be provided.
material is sometimes sparsely populated, so an item might not be
available at every level of tag granularity. "Falling back" through
the subtag sequence provides more opportunity to find a match between
available language tags and the user's request.
Extensions and unrecognized private-use subtags might be unrelated to Extensions and unrecognized private-use subtags might be unrelated to
a particular application of lookup. Since these subtags come at the a particular application of lookup. Since these subtags come at the
end of the subtag sequence, they are removed first during the end of the subtag sequence, they are removed first during the
fallback process and usually pose no barrier to interoperability. fallback process and usually pose no barrier to interoperability.
However, an implementation MAY remove these from ranges prior to However, an implementation MAY remove these from ranges prior to
performing the lookup (provided the implementation also removes them performing the lookup (provided the implementation also removes them
from the tags being compared). Such modification is internal to the from the tags being compared). Such modification is internal to the
implementation and applications, protocols, or specifications SHOULD implementation and applications, protocols, or specifications SHOULD
NOT remove or modify subtags in content that they return or forward, NOT remove or modify subtags in content that they return or forward,
because this removes information that might be used elsewhere. because this removes information that can be used elsewhere.
The special language range "*" matches any language tag. In the The special language range "*" matches any language tag. In the
lookup scheme, this range does not convey enough information by lookup scheme, this range does not convey enough information by
itself to determine which language tag is most appropriate, since it itself to determine which language tag is most appropriate, since it
matches everything. If the language range "*" is followed by other matches everything. If the language range "*" is followed by other
language ranges, it is skipped. If the language range "*" is the language ranges, it is skipped. If the language range "*" is the
only one in the language priority list or if no other language range only one in the language priority list or if no other language range
follows, the default value is computed and returned. follows, the default value is computed and returned.
In some cases, the language priority list might contain one or more In some cases, the language priority list can contain one or more
extended language ranges (as, for example, when the same language extended language ranges (as, for example, when the same language
priority list is used as input for both lookup and filtering priority list is used as input for both lookup and filtering
operations). Wildcard values in an extended language range normally operations). Wildcard values in an extended language range normally
match any value that can occur in that position in a language tag. match any value that can occur in that position in a language tag.
Since only one item can be returned for any given lookup request, Since only one item can be returned for any given lookup request,
wildcards in a language range have to be processed in a consistent wildcards in a language range have to be processed in a consistent
manner or the same request will produce widely varying results. manner or the same request will produce widely varying results.
Applications, protocols, or specifications that accept extended Applications, protocols, or specifications that accept extended
language ranges MUST define which item is returned when more than one language ranges MUST define which item is returned when more than one
item matches the extended language range. item matches the extended language range.
For example, an implementation could return the matching tag that is For example, an implementation could map the extended language ranges
first in ASCII-order. If the language range were "*-CH" and the set to basic ranges. Another possibility would be for an implementation
of tags included "de-CH", "fr-CH", and "it-CH", then the tag "de-CH" to return the matching tag that is first in ASCII-order. If the
would be returned. Another possibility would be for an language range were "*-CH" ('CH' represents Switzerland) and the set
implementation to map the extended language ranges to basic ranges. of tags included "de-CH" (German as used in Switzerland), "fr-CH"
(French, Switzerland), and "it-CH" (Italian, Switzerland), then the
tag "de-CH" would be returned.
3.4.1. Default Values 3.4.1. Default Values
Each application, protocol, or specification MUST define the Each application, protocol, or specification that uses lookup MUST
defaulting behavior when no tag matches the language priority list. define the defaulting behavior when no tag matches the language
What this action consists of strongly depends on how lookup is being priority list. What this action consists of strongly depends on how
applied. Some examples of defaulting behavior might include: lookup is being applied. Some examples of defaulting behavior
include:
o return an item with no language tag or an item of a non-linguistic o return an item with no language tag or an item of a non-linguistic
nature, such as an image or sound nature, such as an image or sound
o return a null string as the language tag value, in cases where the o return a null string as the language tag value, in cases where the
protocol permits the empty value (see, for example, "xml:lang" in protocol permits the empty value (see, for example, "xml:lang" in
[XML10]) [XML10])
o return a particular language tag designated for the operation o return a particular language tag designated for the operation
o return the language tag "i-default" (see: [RFC2277]) o return the language tag "i-default" (see: [RFC2277])
o return an error condition or error message o return an error condition or error message
o return a list of available languages for the user to select from o return a list of available languages for the user to select from
When performing lookup using a language priority list, the When performing lookup using a language priority list, the
progressive search MUST process each language range in the list progressive search MUST process each language range in the list
before seeking or calculating the default. before seeking or calculating the default.
skipping to change at page 15, line 14 skipping to change at page 15, line 20
o return the language tag "i-default" (see: [RFC2277]) o return the language tag "i-default" (see: [RFC2277])
o return an error condition or error message o return an error condition or error message
o return a list of available languages for the user to select from o return a list of available languages for the user to select from
When performing lookup using a language priority list, the When performing lookup using a language priority list, the
progressive search MUST process each language range in the list progressive search MUST process each language range in the list
before seeking or calculating the default. before seeking or calculating the default.
The default value MAY be calculated and might include additional The default value MAY be calculated or include additional searching
searching or matching. Applications, protocols, or specifications or matching. Applications, protocols, or specifications can specify
can specify different ways in which users can specify or override the different ways in which users can specify or override the defaults.
defaults.
One common way to provide for a default is to allow a specific One common way to provide for a default is to allow a specific
language range to be set as the default for a specific type of language range to be set as the default for a specific type of
request. If this approach is chosen, this language range MUST be request. If this approach is chosen, this language range MUST be
treated as if it were appended to the end of the language priority treated as if it were appended to the end of the language priority
list as a whole, rather than after each item in the language priority list as a whole, rather than after each item in the language priority
list. The application, protocol, or specification MUST also define list. The application, protocol, or specification MUST also define
the defaulting behavior if that search fails to find a matching tag the defaulting behavior if that search fails to find a matching tag
or item. or item.
For example, if a particular user's language priority list were For example, if a particular user's language priority list is "fr-FR,
"fr-FR, zh-Hant" and the program doing the matching had a default zh-Hant" (French as used in France followed by Chinese as written in
language range of "ja-JP", the program would search as follows: the Traditional script) and the program doing the matching had a
default language range of "ja-JP" (Japanese as used in Japan), then
the program searches as follows:
1. fr-FR 1. fr-FR
2. fr 2. fr
3. zh-Hant // next language 3. zh-Hant // next language
4. zh 4. zh
5. ja-JP // now searching for the default content 5. ja-JP // now searching for the default content
6. ja 6. ja
7. (implementation defined default) 7. (implementation defined default)
Figure 4: Lookup Using a Language Priority List Figure 4: Lookup Using a Language Priority List
4. Other Considerations 4. Other Considerations
When working with language ranges and matching schemes, there are When working with language ranges and matching schemes, there are
some additional points that may influence the choice of either. some additional points that can influence the choice of either.
4.1. Choosing Language Ranges 4.1. Choosing Language Ranges
Users indicate their language preferences via the choice of a Users indicate their language preferences via the choice of a
language range or the list of language ranges in a language priority language range or the list of language ranges in a language priority
list. The type of matching affects what the best choice is for a list. The type of matching affects what the best choice is for a
user. user.
Most matching schemes make no attempt to process the semantic meaning Most matching schemes make no attempt to process the semantic meaning
of the subtags. The language range is compared, in a case- of the subtags. The language range is compared, in a case-
skipping to change at page 16, line 37 skipping to change at page 16, line 37
working with content that might use the older form, the user might working with content that might use the older form, the user might
want to include both the new and old forms in a language priority want to include both the new and old forms in a language priority
list. For example, the tag "art-lojban" is deprecated. The subtag list. For example, the tag "art-lojban" is deprecated. The subtag
'jbo' is supposed to be used instead, so the user might use it to 'jbo' is supposed to be used instead, so the user might use it to
form the language range. Or the user might include both in a form the language range. Or the user might include both in a
language priority list: "jbo, art-lojban". language priority list: "jbo, art-lojban".
Users SHOULD avoid subtags that add no distinguishing value to a Users SHOULD avoid subtags that add no distinguishing value to a
language range. When filtering, the fewer the number of subtags that language range. When filtering, the fewer the number of subtags that
appear in the language range, the more content the range will appear in the language range, the more content the range will
probably match, while in lookup unnecessary subtags might cause probably match, while in lookup unnecessary subtags can cause
"better", more-specific content to be skipped in favor of less "better", more-specific content to be skipped in favor of less
specific content. For example, the range "de-Latn-DE" would return specific content. For example, the range "de-Latn-DE" returns
content tagged "de" instead of content tagged "de-DE", even though content tagged "de" instead of content tagged "de-DE", even though
the latter is probably a better match. the latter is probably a better match.
Whether a subtag adds distinguishing value can depend on the context Whether a subtag adds distinguishing value can depend on the context
of the request. For example, a user who reads both Simplified and of the request. For example, a user who reads both Simplified and
Traditional Chinese, but who prefers Simplified, might use the range Traditional Chinese, but who prefers Simplified, might use the range
"zh" for filtering (matching all items that user can read) but "zh- "zh" for filtering (matching all items that user can read) but "zh-
Hans" for lookup (making sure that user gets the preferred form if Hans" for lookup (making sure that user gets the preferred form if
it's available, but the fallback to "zh" will still work). On the it's available, but the fallback to "zh" will still work). On the
other hand, content in this case should be labeled as "zh-Hans" (or other hand, content in this case ought to be labeled as "zh-Hans" (or
"zh-Hant" if that applies) for filtering, but for lookup, if there is "zh-Hant" if that applies) for filtering, while for lookup, if there
either "zh-Hans" content or "zh-Hant" content, then one of them (the is either "zh-Hans" content or "zh-Hant" content, one of them (the
one considered 'default') should also be available under a simple one considered 'default') also ought to be made available with the
"zh". Note that the user can create a language priority list "zh- simple "zh". Note that the user can create a language priority list
Hans, zh" that delivers the best possible results for both schemes. "zh-Hans, zh" that delivers the best possible results for both
If the user cannot be sure which scheme is being used (or if more schemes. If the user cannot be sure which scheme is being used (or
than one might be applied to a given request), the user SHOULD if more than one might be applied to a given request), the user
specify the most specific (largest number of subtags) range first and SHOULD specify the most specific (largest number of subtags) range
then supply shorter prefixes later in the list to ensure that first and then supply shorter prefixes later in the list to ensure
filtering returns a complete set of tags. that filtering returns a complete set of tags.
Many languages are written predominantly in a single script. This is Many languages are written predominantly in a single script. This is
usually recorded in the Suppress-Script field in that language usually recorded in the Suppress-Script field in that language
subtag's registry entry. For these languages, script subtags SHOULD subtag's registry entry. For these languages, script subtags SHOULD
NOT be used to form a language range. Thus the language range "en- NOT be used to form a language range. Thus the language range "en-
Latn" is inappropriate in most cases (because the vast majority of Latn" is inappropriate in most cases (because the vast majority of
English documents are written in the Latin script and thus the 'en' English documents are written in the Latin script and thus the 'en'
language subtag has a Suppress-Script field for 'Latn' in the language subtag has a Suppress-Script field for 'Latn' in the
registry). registry).
skipping to change at page 17, line 44 skipping to change at page 17, line 44
Selecting language tags using language ranges requires some Selecting language tags using language ranges requires some
understanding by users of what they are selecting. The meaning of understanding by users of what they are selecting. The meaning of
the various subtags in a language range are identical to their the various subtags in a language range are identical to their
meaning in a language tag (see Section 4.2 in [RFC3066bis]), with the meaning in a language tag (see Section 4.2 in [RFC3066bis]), with the
addition that the wildcard "*" represents any matching sequence of addition that the wildcard "*" represents any matching sequence of
values. values.
4.3. Considerations for Private Use Subtags 4.3. Considerations for Private Use Subtags
Private-use subtags require private agreement between the parties Private argeement is necessary between the parties that intend to use
that intend to use or exchange language tags that use them. They or exchange language tags that contain private-use subtags. Great
SHOULD NOT be used in content or protocols intended for general use. caution SHOULD be used in employing private-use subtags in content or
Private-use subtags are simply useless for information exchange protocols intended for general use. Private-use subtags are simply
without prior arrangement. useless for information exchange without prior arrangement.
The value and semantic meaning of private-use tags and of the subtags The value and semantic meaning of private-use tags and of the subtags
used within such a language tag are not defined. Matching private- used within such a language tag are not defined. Matching private-
use tags using language ranges or extended language ranges can result use tags using language ranges or extended language ranges can result
in unpredictable content being returned. in unpredictable content being returned.
4.4. Length Considerations for Language Ranges 4.4. Length Considerations for Language Ranges
Language ranges are very similar to language tags in terms of content Language ranges are very similar to language tags in terms of content
and usage. The same types of restrictions on length that apply to and usage. The same types of restrictions on length that can be
language tags can also apply to language ranges. See [RFC3066bis] applied to language tags can also be applied to language ranges. See
Section 4.3 (Length Considerations). [RFC3066bis] Section 4.3 (Length Considerations).
5. IANA Considerations 5. IANA Considerations
This document presents no new or existing considerations for IANA. This document presents no new or existing considerations for IANA.
6. Security Considerations 6. Security Considerations
Language ranges used in content negotiation might be used to infer Language ranges used in content negotiation might be used to infer
the nationality of the sender, and thus identify potential targets the nationality of the sender, and thus identify potential targets
for surveillance. In addition, unique or highly unusual language for surveillance. In addition, unique or highly unusual language
skipping to change at page 23, line 19 skipping to change at page 23, line 19
contributed to make this document what it is today. contributed to make this document what it is today.
The contributors to [RFC3066bis], [RFC3066] and [RFC1766], each of The contributors to [RFC3066bis], [RFC3066] and [RFC1766], each of
which is a precursor to this document, made enormous contributions which is a precursor to this document, made enormous contributions
directly or indirectly to this document and are generally responsible directly or indirectly to this document and are generally responsible
for the success of language tags. for the success of language tags.
The following people (in alphabetical order by family name) The following people (in alphabetical order by family name)
contributed to this document: contributed to this document:
Harald Alvestrand, Stephane Bortzmeyer, Jeremy Carroll, John Cowan, Harald Alvestrand, Stephane Bortzmeyer, Jeremy Carroll, Peter
Martin Duerst, Frank Ellermann, Doug Ewell, Debbie Garside, Marion Constable, John Cowan, Mark Crispin, Martin Duerst, Frank Ellermann,
Gunn, Kent Karlsson, Ira McDonald, M. Patton, Randy Presuhn, Eric van Doug Ewell, Debbie Garside, Marion Gunn, Jon Hanna, Kent Karlsson,
der Poel, Markus Scherer, and many, many others. Erkki Kolehmainen, Jukka Korpela, Ira McDonald, M. Patton, Randy
Presuhn, Eric van der Poel, Markus Scherer, Misha Wolf, and many,
many others.
Very special thanks must go to Harald Tveit Alvestrand, who Very special thanks must go to Harald Tveit Alvestrand, who
originated RFCs 1766 and 3066, and without whom this document would originated RFCs 1766 and 3066, and without whom this document would
not have been possible. not have been possible.
Authors' Addresses Authors' Addresses
Addison Phillips (editor) Addison Phillips (editor)
Yahoo! Inc. Yahoo! Inc.
 End of changes. 53 change blocks. 
139 lines changed or deleted 153 lines changed or added

This html diff was produced by rfcdiff 1.31. The latest version is available from http://www.levkowetz.com/ietf/tools/rfcdiff/