draft-ietf-ltru-matching-08.txt   draft-ietf-ltru-matching-09.txt 
Network Working Group A. Phillips, Ed. Network Working Group A. Phillips, Ed.
Internet-Draft Quest Software Internet-Draft Yahoo! Inc
Obsoletes: 3066 (if approved) M. Davis, Ed. Obsoletes: 3066 (if approved) M. Davis, Ed.
Expires: June 10, 2006 IBM Expires: August 10, 2006 Google
December 7, 2005 February 6, 2006
Matching of Language Tags Matching of Language Tags
draft-ietf-ltru-matching-08 draft-ietf-ltru-matching-09
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on June 10, 2006. This Internet-Draft will expire on August 10, 2006.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2005). Copyright (C) The Internet Society (2006).
Abstract Abstract
This document describes different mechanisms for comparing, matching, This document describes different mechanisms for comparing, matching,
and evaluating language tags. Possible algorithms for language and evaluating language tags. Possible algorithms for language
negotiation or content selection, filtering, and lookup are negotiation or content selection, filtering, and lookup are
described. This document, in combination with RFC 3066bis (replace described. This document, in combination with RFC 3066bis (replace
"3066bis" with the RFC number assigned to "3066bis" with the RFC number assigned to
draft-ietf-ltru-registry-14), replaces RFC 3066, which replaced RFC draft-ietf-ltru-registry-14), replaces RFC 3066, which replaced RFC
1766. 1766.
skipping to change at page 2, line 15 skipping to change at page 2, line 15
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. The Language Range . . . . . . . . . . . . . . . . . . . . . . 4 2. The Language Range . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Basic Language Range . . . . . . . . . . . . . . . . . . . 4 2.1. Basic Language Range . . . . . . . . . . . . . . . . . . . 4
2.2. Extended Language Range . . . . . . . . . . . . . . . . . 5 2.2. Extended Language Range . . . . . . . . . . . . . . . . . 5
2.3. The Language Priority List . . . . . . . . . . . . . . . . 7 2.3. The Language Priority List . . . . . . . . . . . . . . . . 7
3. Types of Matching . . . . . . . . . . . . . . . . . . . . . . 8 3. Types of Matching . . . . . . . . . . . . . . . . . . . . . . 8
3.1. Choosing a Type of Matching . . . . . . . . . . . . . . . 8 3.1. Choosing a Type of Matching . . . . . . . . . . . . . . . 8
3.2. Filtering . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2. Filtering . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.1. Filtering with Basic Language Ranges . . . . . . . . . 10 3.2.1. Filtering with Basic Language Ranges . . . . . . . . . 11
3.2.2. Filtering with Extended Language Ranges . . . . . . . 11 3.2.2. Filtering with Extended Language Ranges . . . . . . . 11
3.2.3. Scored Filtering . . . . . . . . . . . . . . . . . . . 11 3.2.3. Scored Filtering . . . . . . . . . . . . . . . . . . . 11
3.3. Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3. Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4. Other Considerations . . . . . . . . . . . . . . . . . . . . . 18 4. Other Considerations . . . . . . . . . . . . . . . . . . . . . 19
4.1. Choosing Language Ranges . . . . . . . . . . . . . . . . . 18 4.1. Choosing Language Ranges . . . . . . . . . . . . . . . . . 19
4.2. Meaning of Language Tags and Ranges . . . . . . . . . . . 19 4.2. Meaning of Language Tags and Ranges . . . . . . . . . . . 20
4.3. Considerations for Private Use Subtags . . . . . . . . . . 20 4.3. Considerations for Private Use Subtags . . . . . . . . . . 21
4.4. Length Considerations in Matching . . . . . . . . . . . . 21 4.4. Length Considerations in Matching . . . . . . . . . . . . 22
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24
6. Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 6. Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7. Security Considerations . . . . . . . . . . . . . . . . . . . 25 7. Security Considerations . . . . . . . . . . . . . . . . . . . 26
8. Character Set Considerations . . . . . . . . . . . . . . . . . 26 8. Character Set Considerations . . . . . . . . . . . . . . . . . 27
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 27 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 28
9.1. Normative References . . . . . . . . . . . . . . . . . . . 27 9.1. Normative References . . . . . . . . . . . . . . . . . . . 28
9.2. Informative References . . . . . . . . . . . . . . . . . . 27 9.2. Informative References . . . . . . . . . . . . . . . . . . 28
Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 28 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 29
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 29 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30
Intellectual Property and Copyright Statements . . . . . . . . . . 30 Intellectual Property and Copyright Statements . . . . . . . . . . 31
1. Introduction 1. Introduction
Human beings on our planet have, past and present, used a number of Human beings on our planet have, past and present, used a number of
languages. There are many reasons why one would want to identify the languages. There are many reasons why one would want to identify the
language used when presenting or requesting information. language used when presenting or requesting information.
Information about a user's language preferences commonly needs to be Information about a user's language preferences commonly needs to be
identified so that appropriate processing can be applied. For identified so that appropriate processing can be applied. For
example, the user's language preferences in a browser can be used to example, the user's language preferences in a browser can be used to
skipping to change at page 3, line 30 skipping to change at page 3, line 30
language negotiation and tag matching. language negotiation and tag matching.
This document defines a syntax (called a language range (Section 2)) This document defines a syntax (called a language range (Section 2))
for specifying a user's language preferences, as well as several for specifying a user's language preferences, as well as several
schemes for selecting or filtering content by comparing language schemes for selecting or filtering content by comparing language
ranges to the language tags [RFC3066bis] used to identify the natural ranges to the language tags [RFC3066bis] used to identify the natural
language of that content. Applications, protocols, or specifications language of that content. Applications, protocols, or specifications
will have varying needs and requirements that affect the choice of a will have varying needs and requirements that affect the choice of a
suitable matching scheme. Depending on the choice of scheme, there suitable matching scheme. Depending on the choice of scheme, there
are various options left to the implementation. Protocols that are various options left to the implementation. Protocols that
implement a matching scheme either need to choose a particular option implement a matching scheme either need to specify each particular
or indicate that the particular options is left to the specific choice or indicate the options that are left to the implementation to
implementation to decide. decide.
This document is divided into three main sections. One describes how This document is divided into three main sections. One describes how
to indicate a user's preferences using language ranges. Then a to indicate a user's preferences using language ranges. Then a
section describes various schemes for matching these ranges to a set section describes various schemes for matching these ranges to a set
of language tags in order to select specific content. There is also of language tags in order to select specific content. There is also
a section that deals with various practical considerations that apply a section that deals with various practical considerations that apply
to implementing and using these schemes. to implementing and using these schemes.
This document, in combination with [RFC3066bis] (Ed.: replace This document, in combination with [RFC3066bis] (Ed.: replace
"3066bis" globally in this document with the RFC number assigned to "3066bis" globally in this document with the RFC number assigned to
skipping to change at page 4, line 21 skipping to change at page 4, line 21
HTTP/1.1 [RFC2616] describes one such mechanism in its discussion of HTTP/1.1 [RFC2616] describes one such mechanism in its discussion of
the Accept-Language header (Section 14.4), which is used when the Accept-Language header (Section 14.4), which is used when
selecting content from servers based on the language of that content. selecting content from servers based on the language of that content.
When selecting content according to its language, it is useful to When selecting content according to its language, it is useful to
have a mechanism for identifying sets of language tags that share have a mechanism for identifying sets of language tags that share
specific attributes. This allows users to select or filter content specific attributes. This allows users to select or filter content
based on specific requirements. Such an identifier is called a based on specific requirements. Such an identifier is called a
"Language Range". "Language Range".
Language ranges are similar in structure and content to language
tags: they consist of alphanumeric "subtags" separated by hyphens,
plus a special subtag consisting of the character "*" (%2A,
ASTERISK), which is used in ranges as a "wildcard", that is, a value
that matches any subtag.
Language tags and thus language ranges are to be treated as case- Language tags and thus language ranges are to be treated as case-
insensitive: there exist conventions for the capitalization of some insensitive: there exist conventions for the capitalization of some
of the subtags, but these MUST NOT be taken to carry meaning. of the subtags, but these MUST NOT be taken to carry meaning.
Matching of language tags to language ranges MUST be done in a case- Matching of language tags to language ranges MUST be done in a case-
insensitive manner as well. insensitive manner as well.
2.1. Basic Language Range 2.1. Basic Language Range
A "basic language range" identifies the set of content whose language A "basic language range" identifies the set of content whose language
tags begin with the same sequence of subtags. Each range consists of tags begin with the same sequence of subtags. Each range consists of
a sequence of alphanumeric subtags separated by hyphens. The basic a sequence of alphanumeric subtags separated by hyphens. The basic
language range is defined by the following the ABNF[RFC4234]: language range is defined by the following ABNF[RFC4234]:
language-range = language-tag / "*" language-range = language-tag / "*"
language-tag = 1*8[alphanum] *["-" 1*8alphanum] language-tag = 1*8[alphanum] *["-" 1*8alphanum]
alphanum = ALPHA / DIGIT alphanum = ALPHA / DIGIT
Basic language ranges (originally described by HTTP/1.1 [RFC2616] and Basic language ranges (originally described by HTTP/1.1 [RFC2616] and
later [RFC3066]) have the same syntax as an [RFC3066] language tag or later [RFC3066]) have the same syntax as an [RFC3066] language tag or
are the single character "*". They differ from the language tags are the single character "*". They differ from the language tags
defined in [RFC3066bis] only in that there is no requirement that defined in [RFC3066bis] only in that there is no requirement that
they be "well-formed" or be validated against the IANA Language they be "well-formed" or be validated against the IANA Language
skipping to change at page 6, line 6 skipping to change at page 6, line 6
In an extended language range, the identifier takes the form of a In an extended language range, the identifier takes the form of a
series of subtags which MUST consist of well-formed subtags or the series of subtags which MUST consist of well-formed subtags or the
special subtag "*". For example, the language range "en-*-US" special subtag "*". For example, the language range "en-*-US"
specifies a primary language of 'en', followed by any script subtag, specifies a primary language of 'en', followed by any script subtag,
followed by the region subtag 'US'. followed by the region subtag 'US'.
An extended language range can be represented by the following ABNF: An extended language range can be represented by the following ABNF:
extended-language-range = range ; a range extended-language-range = range ; a range
/ privateuse ; private-use tag / privateuse ; a private-use range
/ grandfathered ; grandfathered registrations / grandfathered ; a grandfathered registration
range = (language range = (language
["-" script] ["-" script]
["-" region] ["-" region]
*("-" variant) *("-" variant)
*("-" extension) *("-" extension)
["-" privateuse]) ["-" privateuse])
language = (2*3ALPHA [ extlang ]) ; shortest ISO 639 code language = (2*3ALPHA [ extlang ]) ; shortest ISO 639 code
/ 4ALPHA ; reserved for future use / 4ALPHA ; reserved for future use
/ 5*8ALPHA ; registered language subtag / 5*8ALPHA ; registered language subtag
/ "*" ; ... or wildcard / "*" ; or wildcard
extlang = *2("-" 3ALPHA) ("-" ( 3ALPHA / "*")) extlang = *2("-" 3ALPHA) ("-" ( 3ALPHA / "*"))
; reserved for future use ; reserved for future use
; wildcard can only appear ; wildcard can only appear
; at the end ; at the end
script = 4ALPHA ; ISO 15924 code script = 4ALPHA ; ISO 15924 code
/ "*" ; or wildcard / "*" ; or wildcard
region = 2ALPHA ; ISO 3166 code region = 2ALPHA ; ISO 3166 code
/ 3DIGIT ; UN M.49 code / 3DIGIT ; UN M.49 code
/ "*" ; ... or wildcard / "*" ; or wildcard
variant = 5*8alphanum ; registered variants variant = 5*8alphanum ; registered variants
/ (DIGIT 3alphanum) ; / (DIGIT 3alphanum) ;
/ "*" ; ... or wildcard / "*" ; or wildcard
extension = singleton *("-" (2*8alphanum)) [ "-*" ] extension = singleton *("-" (2*8alphanum)) [ "-*" ]
; extension subtags ; extension subtags
; wildcard can only appear ; wildcard can only appear
; at the end ; at the end
singleton = "a"-"w" / "y"-"z" / "A"-"W" / "Y"-"Z" / "0"-"9" singleton = %x41-57 / %x59-5A / %x61-77 / %x79-7A / DIGIT
; Single letters: x/X is reserved for private use ; single letters (except for "x") or digits
privateuse = ("x"/"X") 1*("-" (1*8alphanum)) privateuse = "x" 1*("-" (1*8alphanum))
grandfathered = 1*3ALPHA 1*2("-" (2*8alphanum)) grandfathered = 1*3ALPHA 1*2("-" (2*8alphanum))
; grandfathered registration ; grandfathered registration
; Note: I is the only singleton ; Note: I is the only singleton
; that starts a grandfathered tag ; that starts a grandfathered tag
alphanum = (ALPHA / DIGIT) ; letters and numbers alphanum = (ALPHA / DIGIT) ; letters and numbers
A field not present in the middle of an extended language range is A field not present in the middle of an extended language range is
treated as if the field contained a "*". Implementations that treated as if the field contained a "*". Implementations that
normalize extended language ranges SHOULD expand missing fields to be normalize extended language ranges SHOULD expand missing fields to be
skipping to change at page 9, line 21 skipping to change at page 9, line 21
2. Extended Range Filtering (Section 3.2.2) is used to match content 2. Extended Range Filtering (Section 3.2.2) is used to match content
using extended language ranges (Section 2.2). using extended language ranges (Section 2.2).
3. Scored Filtering (Section 3.2.3) produces an ordered set of 3. Scored Filtering (Section 3.2.3) produces an ordered set of
content using extended language ranges. It SHOULD be used when content using extended language ranges. It SHOULD be used when
the quality of the match within a specific language range is the quality of the match within a specific language range is
important, as when presenting a list of documents resulting from important, as when presenting a list of documents resulting from
a search. a search.
4. Lookup (Section 3.3) is used when each request needs to produce 4. Lookup (Section 3.3) is used when each request needs to produce
_exactly_ one piece of content. For example, if process were to _exactly_ one piece of content. For example, if a process were
insert a human readable error message into a protocol header, it to insert a human readable error message into a protocol header,
might select the text based on the user's language preference. it might select the text based on the user's language preference.
Since it can return only one item, it must choose a single item Since it can return only one item, it must choose a single item
and it must return some item, even if no content matches the and it must return some item, even if no content matches the
language priority list supplied by the user. language priority list supplied by the user.
Most types of matching in this document are designed so that Most types of matching in this document are designed so that
implementations are not required to validate or understand any of the implementations are not required to validate or understand any of the
semantics of the subtags supplied and, except for scored filtering, semantics of the subtags supplied and, except for scored filtering,
they do not need access to the IANA Language Subtag Registry (see they do not need access to the IANA Language Subtag Registry (see
Section 3 in [RFC3066bis]). This simplifies and speeds the Section 3 in [RFC3066bis]). This simplifies and speeds the
performance of implementations. performance of implementations.
Regardless of the matching scheme chosen, protocols and
implementations MAY canonicalize language tags and ranges by mapping
grandfathered and obsolete tags or subtags into modern equivalents.
If an implementation canonicalizes either ranges or tags, then the If an implementation canonicalizes either ranges or tags, then the
implementation will require the IANA Language Subtag Registry implementation will require the IANA Language Subtag Registry
information for that purpose. Implementations MAY use semantic information for that purpose. Implementations MAY also use semantic
information external to the registry when matching tags. For information external to the registry when matching tags. For
example, the primary language subtags 'nn' (Nynorsk Norwegian) and example, the primary language subtags 'nn' (Nynorsk Norwegian) and
'nb' (Bokmal Norwegian) might both be usefully matched to the more 'nb' (Bokmal Norwegian) might both be usefully matched to the more
general subtag 'no' (Norwegian). Or an implementation might infer general subtag 'no' (Norwegian). Or an implementation might infer
that content labeled "zh-CN" is more likely to match the range "zh- that content labeled "zh-CN" is more likely to match the range "zh-
Hans" than equivalent content labeled "zh-TW". Hans" than equivalent content labeled "zh-TW".
3.2. Filtering 3.2. Filtering
Filtering is used to select the set of content that matches a given Filtering is used to select the set of content that matches a given
language priority list. It is called "filtering" because this set of language priority list. It is called "filtering" because this set of
content may contain no items at all or it may return an arbitrarily content may contain no items at all or it may return an arbitrarily
large number of matching items--as many as match the language range large number of matching items: as many items as match the language
used to specify the items, thus filtering out the non-matching priority list, thus "filtering out" the non-matching items.
content.
In filtering, the language range represents the _least_ specific In filtering, the language range represents the _least_ specific
(that is, the fewest number of subtags) language tag which is an (that is, the fewest number of subtags) language tag which is an
acceptable match. That is, all of the language tags in the set of acceptable match. That is, all of the language tags in the set of
filtered content will have an equal or greater number of subtags than filtered content will have an equal or greater number of subtags than
the language range. For example, if the language priority list the language range. For example, if the language priority list
consists of the range "de-CH", one might see matching content with consists of the range "de-CH", one might see matching content with
the tag "de-CH-1996" but one will never see a match with the tag the tag "de-CH-1996" but one will never see a match with the tag
"de". "de".
skipping to change at page 11, line 32 skipping to change at page 11, line 36
compared to the corresponding subtags in the language tag being compared to the corresponding subtags in the language tag being
examined. The subtag from the range is considered to match if it examined. The subtag from the range is considered to match if it
exactly matches the corresponding subtag in the tag or the range's exactly matches the corresponding subtag in the tag or the range's
subtag has the value "*" (which matches all subtags, including the subtag has the value "*" (which matches all subtags, including the
empty subtag). empty subtag).
Subtags not specified, including those at the end of the language Subtags not specified, including those at the end of the language
range, are assigned the wildcard value "*". This makes each range range, are assigned the wildcard value "*". This makes each range
into a prefix much like that used in basic language range matching. into a prefix much like that used in basic language range matching.
For example, the extended language range "de-*-DE" matches all of the For example, the extended language range "de-*-DE" matches all of the
following tags because the unspecified variant field is expanded to following tags, in part because the unspecified variant, extension,
"*": and private-use subtags are expanded to "*":
de-DE de-DE
de-Latn-DE de-Latn-DE
de-Latf-DE de-Latf-DE
de-DE-x-goethe de-DE-x-goethe
de-Latn-DE-1996 de-Latn-DE-1996
skipping to change at page 12, line 39 skipping to change at page 12, line 43
Language subtags 'und', 'mul', and the script subtag 'Zyyy' are Language subtags 'und', 'mul', and the script subtag 'Zyyy' are
converted to "*": these subtag values represent undetermined, converted to "*": these subtag values represent undetermined,
multiple, or private-use values which are consistent with the use of multiple, or private-use values which are consistent with the use of
the wildcard. the wildcard.
For language tags that have no script subtag but whose language For language tags that have no script subtag but whose language
subtag's record in the IANA Language Subtag Registry contains the subtag's record in the IANA Language Subtag Registry contains the
field "Suppress-Script", the script element in the quintuple MUST be field "Suppress-Script", the script element in the quintuple MUST be
set to the script subtag in the Suppress-Script field. This is set to the script subtag in the Suppress-Script field. This is
necessary because [RFC3066bis] strongly recommends that users not use necessary because [RFC3066bis] strongly recommends that users not use
this subtag to form language tags and this document recommends that this subtag to form language tags and this document (see Section 4.1)
users not use them to form ranges. For example, if the script were recommends that users not use them to form ranges. Languages which
not expanded in this manner, a range such as "de-DE" would produce a have a "Suppress-Script" field in the registry are predominantly
more-distant score for content that happened to be labeled written in that single script, making the subtag redundant in forming
"de-Latn-DE" than users would expect that it should. Note that a language tag or range. Thus if the script were not expanded in
languages which have a "Suppress-Script" field in the registry are this manner, a range such as "de-DE" would produce a more-distant
predominantly written in a single script. score for content that happened to be labeled "de-Latn-DE" than users
would expect that it should.
Any remaining missing components in the language tag are set to "*"; Any remaining missing components in the language tag are set to "*";
thus an empty language tag becomes the quintuple ("*", "*", "*", "*", thus an empty language tag becomes the quintuple ("*", "*", "*", "*",
"*"). Missing components in the language range are handled similarly "*"). Missing components in the language range are handled similarly
to extended range lookup: missing internal subtags are expanded to to extended range lookup: missing internal subtags are expanded to
"*". Missing end subtags are expanded as the empty string. Thus a "*". Missing end subtags are expanded as the empty string. Thus a
pattern "en-US" becomes the quintuple ("en","*","US","",""). pattern "en-US" becomes the quintuple ("en","*","US","","").
Here are some examples of language tags, showing their quintuples as Here are some examples of language tags, showing their quintuples as
both language tags and language ranges: both language tags and language ranges:
skipping to change at page 14, line 27 skipping to change at page 14, line 27
Examples of various tag's distances from the range "en-US": Examples of various tag's distances from the range "en-US":
"fr-FR" 384 (language & region mismatch) "fr-FR" 384 (language & region mismatch)
"fr" 256 (language mismatch, region match) "fr" 256 (language mismatch, region match)
"en-GB" 32 (region mismatch) "en-GB" 32 (region mismatch)
"en-Latn-US" 0 (all fields match) "en-Latn-US" 0 (all fields match)
"en-Brai" 32 (region mismatch) "en-Brai" 32 (region mismatch)
"en-US-x-foo" 4 (variant mismatch: range is the empty string) "en-US-x-foo" 4 (variant mismatch: range is the empty string)
"en-US-r-wadegile" 1 (extension mismatch: range is the empty string) "en-US-r-wadegile" 1 (extension mismatch: range is the empty string)
Note: A variation of this algorithm might vary the scoring used Where a language priority list follows the syntax of the "Accept-
overall or for specific values. For example, sometimes it might make Language" header defined in [RFC2616] (see Section 14.4) and
sense to use more sophisticated weighting that depends on the values [RFC3282], language ranges without a Q value are given values equal
of the corresponding elements. Thus, depending on the domain, an to the value of the previous language range in the list (processing
implementation might assign a smaller distance to the difference from first to last). If the first language range has no Q value, it
between closely related subtags (or treat certain values as equal). is given a value of 1.0. Language ranges with Q values of zero are
Some examples of closely related subtags might be: removed. For example, "fr, en;q=0.5, de, it" becomes
"fr;q=1.0,en;q=0.5,de;q=0.5,it;q=0.5". The distance values given
above are then divided by the Q values. For example, if that
language tag "fr-FR" has a distance of 384 from a language range with
a Q value of 0.8, then the resulting distance is 480 (384 div 0.8).
Implementations or protocols MAY use different weighting systems than
the ones described above, as long as the weightings and weighting
mechanisms are clearly specified. Thus, for example, an
implementation or protocol could give all language tags with missing
Q values a value of 1.0, or give the distance value 1000 to a
language mismatch. They MAY also use more sophisticated weights that
depend on the values of the corresponding elements. For example, an
implementation might give a small distance to the difference closely
related subtags. Some examples of closely related subtags might be:
Language: Language:
no (Norwegian) no (Norwegian)
nb (Bokmal Norwegian) nb (Bokmal Norwegian)
nn (Nynorsk Norwegian) nn (Nynorsk Norwegian)
Script: Script:
Kata (katakana) Kata (katakana)
Hira (hiragana) Hira (hiragana)
skipping to change at page 17, line 19 skipping to change at page 17, line 33
manner or the same request will produce widely varying results. manner or the same request will produce widely varying results.
Implementations that accept extended language ranges MUST define Implementations that accept extended language ranges MUST define
which content is returned when more than one item matches the which content is returned when more than one item matches the
extended language range. extended language range.
For example, an implementation could return the matching content that For example, an implementation could return the matching content that
is first in ASCII-order. For example, if the language range were is first in ASCII-order. For example, if the language range were
"*-CH" and the set of content included "de-CH", "fr-CH", and "it-CH", "*-CH" and the set of content included "de-CH", "fr-CH", and "it-CH",
then the content labeled "de-CH" would be returned. then the content labeled "de-CH" would be returned.
Another way an implementation could address extended language ranges Implementations MAY also map extended language ranges to basic
would be to map them to basic language ranges: if the first subtag is language ranges: if the first subtag is a "*" then the entire range
a "*" then the entire range is treated as "*" (which matches the is treated as "*" (which matches the default content), otherwise each
default content), otherwise the wildcard subtag is removed. For wildcard subtag is removed. For example, if the language range were
example, if the language range were "en-*-US", then the range would "en-*-US", then the range would be mapped to "en-US".
be mapped to "en-US".
Where a language priority list contains Q values as in the syntax of
the "Accept-Language" header defined in [RFC2616] (see Section 14.4)
and [RFC3282], language tags without a Q value are given values equal
to the value of the previous language tag (processing from first to
last). If the first language tag has no Q value, it is given a value
of 1.0. Then language tags with zero Q values are removed. For
example, "fr, en;q=0.5, de, it" becomes "fr;q=1.0, en;q=0.5,
de;q=0.5, it;q=0.5". The language priority list is then sorted from
highest priority to lowest, whereby any two language tags with the
same Q values are remain in the same order as in the original
language priority list. This list is then traversed as described
above in doing lookup.
Implementations or protocols MAY use different lookup mechanisms
systems than the ones described above, as long as those mechanisms
are clearly specified.
4. Other Considerations 4. Other Considerations
When working with language ranges and matching schemes, there are When working with language ranges and matching schemes, there are
some additional points that may influence the choice of either. some additional points that may influence the choice of either.
4.1. Choosing Language Ranges 4.1. Choosing Language Ranges
Users indicate their language preferences via the choice of a Users indicate their language preferences via the choice of a
language range or the list of language ranges in a language priority language range or the list of language ranges in a language priority
skipping to change at page 18, line 40 skipping to change at page 19, line 40
vast majority of English documents are written in the Latin script vast majority of English documents are written in the Latin script
and thus the 'en' language subtag has a Suppress-Script field for and thus the 'en' language subtag has a Suppress-Script field for
'Latn' in the registry). 'Latn' in the registry).
When working with tags and ranges, note that extensions and most When working with tags and ranges, note that extensions and most
private-use subtags are orthogonal to language tag matching, in that private-use subtags are orthogonal to language tag matching, in that
they specify additional attributes of the text not related to the they specify additional attributes of the text not related to the
goals of most matching schemes. Users SHOULD avoid using these goals of most matching schemes. Users SHOULD avoid using these
subtags in language ranges, since they interfere with the selection subtags in language ranges, since they interfere with the selection
of available content. When used in language tags (as opposed to of available content. When used in language tags (as opposed to
ranges), these subtags normally do not interefer with filtering ranges), these subtags normally do not interfere with filtering
(Section 3), since they appear at the end of the tag and will match (Section 3), since they appear at the end of the tag and will match
all prefixes. all prefixes.
When working with language tags and language ranges note that: When working with language tags and language ranges note that:
o Private-use and Extension subtags are normally orthogonal to o Private-use and Extension subtags are normally orthogonal to
language tag fallback. Implementations or specifications that use language tag fallback. Implementations or specifications that use
a lookup (Section 3.3) matching scheme often ignore unrecognized a lookup (Section 3.3) matching scheme often ignore unrecognized
private-use and extension subtags when performing language tag private-use and extension subtags when performing language tag
fallback. In addition, since these subtags are always at the end fallback. In addition, since these subtags are always at the end
skipping to change at page 28, line 21 skipping to change at page 29, line 21
The contributors to [RFC3066bis], [RFC3066] and [RFC1766], each of The contributors to [RFC3066bis], [RFC3066] and [RFC1766], each of
which is a precursor to this document, made enormous contributions which is a precursor to this document, made enormous contributions
directly or indirectly to this document and are generally responsible directly or indirectly to this document and are generally responsible
for the success of language tags. for the success of language tags.
The following people (in alphabetical order by family name) The following people (in alphabetical order by family name)
contributed to this document: contributed to this document:
Harald Alvestrand, Jeremy Carroll, John Cowan, Martin Duerst, Frank Harald Alvestrand, Jeremy Carroll, John Cowan, Martin Duerst, Frank
Ellermann, Doug Ewell, Marion Gunn, Kent Karlsson, Ira McDonald, M. Ellermann, Doug Ewell, Marion Gunn, Kent Karlsson, Ira McDonald, M.
Patton, Randy Presuhn, Eric van der Poel, and many, many others. Patton, Randy Presuhn, Eric van der Poel, Markus Scherer, and many,
many others.
Very special thanks must go to Harald Tveit Alvestrand, who Very special thanks must go to Harald Tveit Alvestrand, who
originated RFCs 1766 and 3066, and without whom this document would originated RFCs 1766 and 3066, and without whom this document would
not have been possible. not have been possible.
For this particular document, John Cowan originated the scheme For this particular document, John Cowan originated the scheme
described in Section 3.2.3. Mark Davis originated the scheme described in Section 3.2.3. Mark Davis originated the scheme
described in the Section 3.3. described in the Section 3.3.
Authors' Addresses Authors' Addresses
Addison Phillips (editor) Addison Phillips (editor)
Quest Software Yahoo! Inc
Email: addison dot phillips at quest dot com Email: addison at inter dash locale dot com
Mark Davis (editor) Mark Davis (editor)
IBM Google
Email: mark dot davis at ibm dot com Email: mark dot davis at macchiato dot com
Intellectual Property Statement Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be on the procedures with respect to rights in RFC documents can be
skipping to change at page 30, line 41 skipping to change at page 31, line 41
This document and the information contained herein are provided on an This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement Copyright Statement
Copyright (C) The Internet Society (2005). This document is subject Copyright (C) The Internet Society (2006). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights. except as set forth therein, the authors retain all their rights.
Acknowledgment Acknowledgment
Funding for the RFC Editor function is currently provided by the Funding for the RFC Editor function is currently provided by the
Internet Society. Internet Society.
 End of changes. 31 change blocks. 
70 lines changed or deleted 110 lines changed or added

This html diff was produced by rfcdiff 1.29, available from http://www.levkowetz.com/ietf/tools/rfcdiff/