draft-sparks-sip-nit-problems-00.txt   draft-sparks-sip-nit-problems-01.txt 
Network Working Group R. Sparks Network Working Group R. Sparks
Internet-Draft dynamicsoft Internet-Draft dynamicsoft
Expires: August 6, 2004 February 6, 2004 Expires: January 13, 2005 July 15, 2004
Problems identified associated with the Session Initiation Protocol's Problems identified associated with the Session Initiation Protocol's
non-INVITE Transaction non-INVITE Transaction
draft-sparks-sip-nit-problems-00 draft-sparks-sip-nit-problems-01
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with By submitting this Internet-Draft, I certify that any applicable
all provisions of Section 10 of RFC2026. patent or other IPR claims of which I am aware have been disclosed,
and any of which I become aware will be disclosed, in accordance with
RFC 3668.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other Task Force (IETF), its areas, and its working groups. Note that
groups may also distribute working documents as Internet-Drafts. other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at http:// The list of current Internet-Drafts can be accessed at
www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 6, 2004. This Internet-Draft will expire on January 13, 2005.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2004). All Rights Reserved. Copyright (C) The Internet Society (2004). All Rights Reserved.
Abstract Abstract
This draft describes several problems that have been identified with This draft describes several problems that have been identified with
the Session Initiation Protocol's non-INVITE transaction. the Session Initiation Protocol's non-INVITE transaction.
Table of Contents Table of Contents
1. Problems under the current specifications . . . . . . . . . . 3 1. Problems under the current specifications . . . . . . . . . . 3
1.1 NITs must complete immediately or risk losing a race . . . . . 3 1.1 NITs must complete immediately or risk losing a race . . . 3
1.2 Provisional responses can delay recovery from lost final 1.2 Provisional responses can delay recovery from lost
responses . . . . . . . . . . . . . . . . . . . . . . . . . . 4 final responses . . . . . . . . . . . . . . . . . . . . . 4
1.3 Delayed responses will temporarily blacklist an element . . . 5 1.3 Delayed responses will temporarily blacklist an element . 5
1.4 408 for non-INVITE is not useful . . . . . . . . . . . . . . . 6 1.4 408 for non-INVITE is not useful . . . . . . . . . . . . . 6
1.5 Non-INVITE timeouts doom forking proxies . . . . . . . . . . . 8 1.5 Non-INVITE timeouts doom forking proxies . . . . . . . . . 8
1.6 Mismatched timer values make winning the race harder . . . . . 8 1.6 Mismatched timer values make winning the race harder . . . 8
2. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9 2. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9
References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Author's Address . . . . . . . . . . . . . . . . . . . . . . . 9 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 9
Intellectual Property and Copyright Statements . . . . . . . . 10 Intellectual Property and Copyright Statements . . . . . . . . 10
1. Problems under the current specifications 1. Problems under the current specifications
There are a number of unpleasant edge conditions created by the SIP There are a number of unpleasant edge conditions created by the SIP
non-INVITE transaction model's fixed duration. The negative aspects non-INVITE transaction model's fixed duration. The negative aspects
of some of these are exacerbated by the effect provisional responses of some of these are exacerbated by the effect provisional responses
have on the non-INVITE transaction state machines as currently have on the non-INVITE transaction state machines as currently
defined. defined.
1.1 NITs must complete immediately or risk losing a race 1.1 NITs must complete immediately or risk losing a race
The non-INVITE transaction defined in RFC 3261 [1] is designed to The non-INVITE transaction defined in RFC 3261 [1] is designed to
have a fixed and finite duration (dependent on T1). A consequence of have a fixed and finite duration (dependent on T1). A consequence of
this design is that participants must strive to complete the this design is that participants must strive to complete the
transaction as quickly as possible. Consider the race condition shown transaction as quickly as possible. Consider the race condition
in Figure 1. shown in Figure 1.
UAC UAS UAC UAS
| request | | request |
--- |---. | --- |---. |
^ | `---. | ^ | `---. |
| | `-->| --- | | `-->| ---
| | | ^ | | | ^
| | | | | | | |
64*T1 | | | 64*T1 | | |
| | | | | | | |
skipping to change at page 3, line 46 skipping to change at page 3, line 46
| .---' | --- | .---' | ---
|<--' | |<--' |
Figure 1: NI Race Condition Figure 1: NI Race Condition
The UAS in this figure believes it has responded to the request in The UAS in this figure believes it has responded to the request in
time, and that the request succeeded. The UAC, on the other hand, time, and that the request succeeded. The UAC, on the other hand,
believes the request has timed-out, hence failed. No longer having a believes the request has timed-out, hence failed. No longer having a
matching client transaction, the UAC core will ignore what it matching client transaction, the UAC core will ignore what it
believes to be a spurious response. As far as the UAC is concerned, believes to be a spurious response. As far as the UAC is concerned,
it received no response at all to its request. The ultimate result is it received no response at all to its request. The ultimate result
the UAS and UAC have conflicting views of the outcome of the is the UAS and UAC have conflicting views of the outcome of the
transaction. transaction.
Therefore, a UAS cannot wait until the last possible moment to send a Therefore, a UAS cannot wait until the last possible moment to send a
final response within a NIT. It must, instead, send its response so final response within a NIT. It must, instead, send its response so
that it will arrive at the UAC before that UAC times out. that it will arrive at the UAC before that UAC times out.
Unfortunately, the UAS has no way to accurately measure the Unfortunately, the UAS has no way to accurately measure the
propagation time of the request or predict the propagation time of propagation time of the request or predict the propagation time of
the response. The uncertainty it faces is compounded by each proxy the response. The uncertainty it faces is compounded by each proxy
that participates in the transaction. Thus, the UAS's only choice is that participates in the transaction. Thus, the UAS's only choice is
to send its final response as soon as it possibly can and hope for to send its final response as soon as it possibly can and hope for
skipping to change at page 4, line 36 skipping to change at page 4, line 36
time to delay its final response in order to perform some processing time to delay its final response in order to perform some processing
such as a database lookup while mitigating its risk of losing the such as a database lookup while mitigating its risk of losing the
race in Figure 1. Establishing this knowledge across arbitrary race in Figure 1. Establishing this knowledge across arbitrary
networks (perhaps using resource reservation techniques and networks (perhaps using resource reservation techniques and
deterministic transports) is not currently feasible. deterministic transports) is not currently feasible.
1.2 Provisional responses can delay recovery from lost final responses 1.2 Provisional responses can delay recovery from lost final responses
The non-INVITE client transaction state machine provides reliability The non-INVITE client transaction state machine provides reliability
for NITs over unreliable transports (UDP) through retransmission of for NITs over unreliable transports (UDP) through retransmission of
the request message. Timer E is set to T1 when a request is initially the request message. Timer E is set to T1 when a request is
transmitted. As long as the machine remains in the Trying state, each initially transmitted. As long as the machine remains in the Trying
time Timer E fires, it will be reset to twice its previous value state, each time Timer E fires, it will be reset to twice its
(capping at T2) and the request is retransmitted. previous value (capping at T2) and the request is retransmitted.
If the non-INVITE client transaction state machine sees a provisional If the non-INVITE client transaction state machine sees a provisional
response, it transitions to the Proceeding state, where response, it transitions to the Proceeding state, where
retransmission continues, but the algorithm for resetting Timer E is retransmission continues, but the algorithm for resetting Timer E is
simply to use T2 instead of doubling at each firing. (Note that Timer simply to use T2 instead of doubling at each firing. (Note that
E is not altered during the transition to Proceeding). Timer E is not altered during the transition to Proceeding).
Making the transition to the Proceeding state before Timer E is reset Making the transition to the Proceeding state before Timer E is reset
to T2 can cause recovery from a lost final response to take extra to T2 can cause recovery from a lost final response to take extra
time. Figure 2 shows recovery from a lost final response with and time. Figure 2 shows recovery from a lost final response with and
without a provisional message during this window. Recovery occurs without a provisional message during this window. Recovery occurs
within 2*T1 in the case without the provisional. With the within 2*T1 in the case without the provisional. With the
provisional, recovery is delayed until T2, which by default is 8*T1. provisional, recovery is delayed until T2, which by default is 8*T1.
In practical terms, a provisional response to a NIT in currently In practical terms, a provisional response to a NIT in currently
deployed networks can delay transaction completion by up to 3.5 deployed networks can delay transaction completion by up to 3.5
skipping to change at page 6, line 6 skipping to change at page 6, line 6
T2. T2.
1.3 Delayed responses will temporarily blacklist an element 1.3 Delayed responses will temporarily blacklist an element
A SIP element's use of SRV is specified in RFC 3263 [2]. That A SIP element's use of SRV is specified in RFC 3263 [2]. That
specification discusses how SIP assures high availability by having specification discusses how SIP assures high availability by having
upstream elements detect failure of downstream elements. It proceeds upstream elements detect failure of downstream elements. It proceeds
to define several types of failure detection and instructions for to define several types of failure detection and instructions for
failover. Two of the behaviors it describes are important to this failover. Two of the behaviors it describes are important to this
document: document:
o Within a transaction, transport failure is detected either through o Within a transaction, transport failure is detected either through
an explicit report from the transport layer or through timeout. an explicit report from the transport layer or through timeout.
Note specifically that timeout will indicates transport failure Note specifically that timeout will indicates transport failure
regardless of the transport in use. When transport failure is regardless of the transport in use. When transport failure is
detected, the request is retried at the next element from the detected, the request is retried at the next element from the
sorted results of the SRV query. sorted results of the SRV query.
o Between transactions, locations reporting temporary failure o Between transactions, locations reporting temporary failure
(through 503/Retry-After for example) are not used until their (through 503/Retry-After for example) are not used until their
requested black-out period expires. requested black-out period expires.
The specification notes the benefit of caching locations that are The specification notes the benefit of caching locations that are
successfully contacted, but does not discuss how such a cache is successfully contacted, but does not discuss how such a cache is
maintained. It is unclear whether an element should stop using maintained. It is unclear whether an element should stop using
(temporarily blacklist) a location returned in the SRV query that (temporarily blacklist) a location returned in the SRV query that
results in a transport error. If it does, when should such a location results in a transport error. If it does, when should such a
be removed from the blacklist? location be removed from the blacklist?
Without such a blacklist (or equivalent mechanism), the intended Without such a blacklist (or equivalent mechanism), the intended
availability mechanism fails miserably. Consider traffic between two availability mechanism fails miserably. Consider traffic between two
domains. Proxy pA in domain A needs to forward a sequence of domains. Proxy pA in domain A needs to forward a sequence of
non-INVITE requests to domain B. Through DNS SRV, pA discovers pB1 non-INVITE requests to domain B. Through DNS SRV, pA discovers pB1
and pB2, and the ordering rules of [2] and [3] indicate it should use and pB2, and the ordering rules of [2] and [3] indicate it should use
pB1 first. The first request to pB1 times out. Since pA is a proxy pB1 first. The first request to pB1 times out. Since pA is a proxy
and a NIT has a fixed duration, pA has no opportunity to retry the and a NIT has a fixed duration, pA has no opportunity to retry the
request at pB2. If pA does not remember pB1's failure, the second request at pB2. If pA does not remember pB1's failure, the second
request (and all subsequent non-INVITE requests until pB1 recovers) request (and all subsequent non-INVITE requests until pB1 recovers)
are doomed to the same failure. Caching would allow the subsequent are doomed to the same failure. Caching would allow the subsequent
requests to be tried at pB2. requests to be tried at pB2.
Since miserable failure is not acceptable in deployed networks, we Since miserable failure is not acceptable in deployed networks, we
should anticipate that elements will, in fact, cache timeout failures should anticipate that elements will, in fact, cache timeout failures
between transactions. Then the race in Figure 1 becomes important. If between transactions. Then the race in Figure 1 becomes important.
an element fails to respond "soon enough", it has effectively not If an element fails to respond "soon enough", it has effectively not
responded at all, and will be blacklisted at its peer for some period responded at all, and will be blacklisted at its peer for some period
of time. of time.
(Note that even with caching, the first request timeout results in a (Note that even with caching, the first request timeout results in a
timeout failure all the way back to the original submitter. The timeout failure all the way back to the original submitter. The
failover mechanisms in [2] work well to increase the resiliency of a failover mechanisms in [2] work well to increase the resiliency of a
given INVITE transaction, but do nothing for a given non-INVITE given INVITE transaction, but do nothing for a given non-INVITE
transaction.) transaction.)
1.4 408 for non-INVITE is not useful 1.4 408 for non-INVITE is not useful
skipping to change at page 7, line 4 skipping to change at page 6, line 48
responded at all, and will be blacklisted at its peer for some period responded at all, and will be blacklisted at its peer for some period
of time. of time.
(Note that even with caching, the first request timeout results in a (Note that even with caching, the first request timeout results in a
timeout failure all the way back to the original submitter. The timeout failure all the way back to the original submitter. The
failover mechanisms in [2] work well to increase the resiliency of a failover mechanisms in [2] work well to increase the resiliency of a
given INVITE transaction, but do nothing for a given non-INVITE given INVITE transaction, but do nothing for a given non-INVITE
transaction.) transaction.)
1.4 408 for non-INVITE is not useful 1.4 408 for non-INVITE is not useful
Consider the race condition in Figure 1 when the final response is Consider the race condition in Figure 1 when the final response is
408 instead of 200. Under the current specification, the race is 408 instead of 200. Under the current specification, the race is
guaranteed to be lost. Most existing endpoints will emit a 408 for a guaranteed to be lost. Most existing endpoints will emit a 408 for a
non-INVITE request 64*T1 after receiving the request if they haven't non-INVITE request 64*T1 after receiving the request if they haven't
emitted an earlier final response. Such a 408 is guaranteed to arrive emitted an earlier final response. Such a 408 is guaranteed to
at the next upstream element too late to be useful. In fact, in the arrive at the next upstream element too late to be useful. In fact,
presence of proxies, these messages are even harmful. When the 408 in the presence of proxies, these messages are even harmful. When
arrives, each proxy will have already terminated its associated the 408 arrives, each proxy will have already terminated its
client transaction due to timeout. So, each proxy must forward the associated client transaction due to timeout. So, each proxy must
408 upstream statelessly. This, in turn, is guaranteed to arrive too forward the 408 upstream statelessly. This, in turn, is guaranteed
late. As Figure 3 shows, this can ultimately result in bombarding to arrive too late. As Figure 3 shows, this can ultimately result
the original requester with spurious 408s. (Note that the proxy's in bombarding the original requester with spurious 408s. (Note that
client transaction state machine never enters the Completed state, so the proxy's client transaction state machine never enters the
Timer K does not enter into play). Completed state, so Timer K does not enter into play).
UAC P1 P2 P3 UAS UAC P1 P2 P3 UAS
| | | | | | | | | |
--- ===---. | | | | --- ===---. | | | |
^ | `-->===---. | | | ^ | `-->===---. | | |
| | | `-->===---. | | | | | `-->===---. | |
| | | | `-->===---. | | | | | `-->===---. |
64*T1 | | | | `-->=== 64*T1 | | | | `-->===
| | | | | | | | | | | |
| | | | | | | | | | | |
skipping to change at page 9, line 12 skipping to change at page 9, line 12
must be taken when deploying systems with non-defaults to ensure they must be taken when deploying systems with non-defaults to ensure they
will _never_ directly communicate with elements with default values. will _never_ directly communicate with elements with default values.
2. Acknowledgments 2. Acknowledgments
This document captures many conversations about non-INVITE issues. This document captures many conversations about non-INVITE issues.
Significant contributers include Ben Campbell, Gonzalo Camarillo, Significant contributers include Ben Campbell, Gonzalo Camarillo,
Steve Donovan, Rohan Mahy, Dan Petrie, Adam Roach, Jonathan Steve Donovan, Rohan Mahy, Dan Petrie, Adam Roach, Jonathan
Rosenberg, and Dean Willis. Rosenberg, and Dean Willis.
References 3 References
[1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., [1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A.,
Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP:
Session Initiation Protocol", RFC 3261, June 2002. Session Initiation Protocol", RFC 3261, June 2002.
[2] Rosenberg, J. and H. Schulzrinne, "Session Initiation Protocol [2] Rosenberg, J. and H. Schulzrinne, "Session Initiation Protocol
(SIP): Locating SIP Servers", RFC 3263, June 2002. (SIP): Locating SIP Servers", RFC 3263, June 2002.
[3] Gulbrandsen, A., Vixie, P. and L. Esibov, "A DNS RR for [3] Gulbrandsen, A., Vixie, P. and L. Esibov, "A DNS RR for
specifying the location of services (DNS SRV)", RFC 2782, specifying the location of services (DNS SRV)", RFC 2782,
skipping to change at page 10, line 8 skipping to change at page 10, line 8
dynamicsoft dynamicsoft
5100 Tennyson Parkway 5100 Tennyson Parkway
Suite 1200 Suite 1200
Plano, TX 75024 Plano, TX 75024
EMail: rsparks@dynamicsoft.com EMail: rsparks@dynamicsoft.com
Intellectual Property Statement Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any The IETF takes no position regarding the validity or scope of any
intellectual property or other rights that might be claimed to Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights this document or the extent to which any license under such rights
might or might not be available; neither does it represent that it might or might not be available; nor does it represent that it has
has made any effort to identify any such rights. Information on the made any independent effort to identify any such rights. Information
IETF's procedures with respect to rights in standards-track and on the procedures with respect to rights in RFC documents can be
standards-related documentation can be found in BCP-11. Copies of found in BCP 78 and BCP 79.
claims of rights made available for publication and any assurances of
licenses to be made available, or the result of an attempt made to Copies of IPR disclosures made to the IETF Secretariat and any
obtain a general license or permission for the use of such assurances of licenses to be made available, or the result of an
proprietary rights by implementors or users of this specification can attempt made to obtain a general license or permission for the use of
be obtained from the IETF Secretariat. such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary copyrights, patents or patent applications, or other proprietary
rights which may cover technology that may be required to practice rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF Executive this standard. Please address the information to the IETF at
Director. ietf-ipr@ietf.org.
Full Copyright Statement
Copyright (C) The Internet Society (2004). All Rights Reserved. Disclaimer of Validity
This document and translations of it may be copied and furnished to This document and the information contained herein are provided on an
others, and derivative works that comment on or otherwise explain it "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
or assist in its implementation may be prepared, copied, published OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
and distributed, in whole or in part, without restriction of any ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
kind, provided that the above copyright notice and this paragraph are INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
included on all such copies and derivative works. However, this INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
document itself may not be modified in any way, such as by removing WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be Copyright Statement
revoked by the Internet Society or its successors or assignees.
This document and the information contained herein is provided on an Copyright (C) The Internet Society (2004). This document is subject
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING to the rights, licenses and restrictions contained in BCP 78, and
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING except as set forth therein, the authors retain all their rights.
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgment Acknowledgment
Funding for the RFC Editor function is currently provided by the Funding for the RFC Editor function is currently provided by the
Internet Society. Internet Society.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/