draft-ietf-tcpm-persist-01.txt   draft-ietf-tcpm-persist-02.txt 
TCP Maintenance and Minor M. Bashyam TCP Maintenance and Minor M. Bashyam
Extensions Working Group Ocarina Networks, Inc Extensions Working Group Ocarina Networks, Inc
Internet-Draft M. Jethanandani Internet-Draft M. Jethanandani
Intended status: Informational A. Ramaiah Intended status: Informational A. Ramaiah
Expires: May 14, 2011 Cisco Systems Expires: August 18, 2011 Cisco
November 10, 2010 February 14, 2011
Clarification of sender behaviour in persist condition. Clarification of sender behavior in persist condition.
draft-ietf-tcpm-persist-01.txt draft-ietf-tcpm-persist-02.txt
Abstract Abstract
This document attempts to clarify the notion of the Zero Window This document clarifies the Zero Window Probes (ZWP) described in
Probes (ZWP) described in RFC 1122 [RFC1122]. In particular, it [RFC1122]. In particular, it clarifies the actions that can be taken
clarifies the actions that can be taken on connections which are on connections which are experiencing the ZWP condition.
experiencing the ZWP condition. The motivation for this document
stems from the belief that TCP implementations strictly adhering to
the current RFC language have the potential to become vulnerable to
Denial of Service (DoS) scenarios.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on May 14, 2011. This Internet-Draft will expire on August 18, 2011.
Copyright Notice Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Discussion on RFC 1122 Requirement . . . . . . . . . . . . . . 4 2. Discussion on RFC 1122 Requirement . . . . . . . . . . . . . . 4
3. Description of Attack . . . . . . . . . . . . . . . . . . . . 5 3. Description of one Simple Attack . . . . . . . . . . . . . . . 5
4. Clarification Regarding RFC 1122 Requirements . . . . . . . . 6 4. Clarification Regarding RFC 1122 Requirements . . . . . . . . 6
5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 7 5. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 7
6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 8 6. Appendix A, Programming Considerations . . . . . . . . . . . . 8
7. Programming Considerations . . . . . . . . . . . . . . . . . . 9 7. Informative References . . . . . . . . . . . . . . . . . . . . 10
8. Informative References . . . . . . . . . . . . . . . . . . . . 10
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11
1. Introduction 1. Introduction
TCP implementations strictly adhering to Section 4.2.2.17 of Section 4.2.2.17 of [RFC1122] says:
[RFC1122] have the potential to become vulnerable to Denial of
Service (DoS) scenarios. That section of [RFC1122] says:
"A TCP MAY keep its offered receive window closed indefinitely. "A TCP MAY keep its offered receive window closed indefinitely.
As long as the receiving TCP continues to send acknowledgments in As long as the receiving TCP continues to send acknowledgments in
response to the probe segments, the sending TCP MUST allow the response to the probe segments, the sending TCP MUST allow the
connection to stay open." connection to stay open."
DISCUSSION: DISCUSSION:
It is extremely important to remember that ACK (acknowledgment) It is extremely important to remember that ACK (acknowledgment)
segments that contain no data are not reliably transmitted by segments that contain no data are not reliably transmitted by
TCP. TCP.
Therefore zero window probing SHOULD be supported to prevent a Therefore zero window probing SHOULD be supported to prevent a
connection from hanging forever if ACK segments that re-opens the connection from hanging forever if ACK segments that re-opens the
window is lost. The condition where the sender goes into the Zero- window is lost. The condition where the sender goes into the Zero-
Window Probe (ZWP) mode is typically known as the 'persist Window Probe (ZWP) mode is typically known as the 'persist
condition'. It is under this condition that the sending TCP can condition'.
become vulnerable to DoS.
2. Discussion on RFC 1122 Requirement This guidance is not intended to preclude resource management by the
operating system or application, which may request connections to be
aborted regardless of them being in the persist condition, and the
TCP implementation should, of course, comply by aborting such
connections. TCP implementations strictly adhering to Section
4.2.2.17 of [RFC1122] have the potential to make systems vulnerable
to Denial of Service (DoS) scenarios where attackers tie up resources
by keeping connections in the persist condition, if such resource
management is not performed external to the protocol implementation.
It needs to be emphasised that TCP MUST NOT take any action of its Section 2 of this document describes why implementations must not
own when a particular connection is in persist condition for a long close connections merely because they are in the persist condition,
time. As per RFC 1122 as long as the ACK's are being received for yet must still allow such connections to be closed on command.
window probes, it can continue to stay in persist condition. This is Section 3 outlines a simple attack on systems that do not
important because typically applications would want the TCP sufficiently manage connections in this state. Section 4 concludes
connection to stay open unless it explicitly closes the connection. with a requirements-language clarification to the RFC 1122
For example take the case of user running a print job and the printer requirement.
ran out of paper waiting for the user intervention. It would be
premature for TCP to take action on its own. Hence TCP cannot act as
a resource manager and it is the system or application's
responsibility to take appropriate action.
At the same time, many existing TCP implementations that adhere 2. Discussion on RFC 1122 Requirement
strictly to the above verbiage of RFC 1122 may fall victim to DOS
attacks, if appropriate measures are not followed. For example, if
we take the case of a busy server where multiple clients can
advertise a zero forever (by reliably acknowledging the ZWP's), it
could eventually lead to the resource exhaustion in the system. In
such cases the system would need to take appropriate action on the
TCP connection to reclaim the resources.
This document is not intended to provide any advice on any particular Per [RFC1122] as long as the ACK's are being received for window
resource management scheme that can be implemented to circumvent DOS probes, a connection can continue to stay in the persist condition.
issues arising due to the connections stuck in the persist state. This is an important feature because typically applications would
want the TCP connection to stay open unless an application explicitly
closes the connection.
The problem is applicable to TCP and TCP derived transport protocols For example take the case of user running a network print job during
like SCTP. which the printer runs out of paper and is waiting for the user
intervention to reload the paper tray. The printer may not be
reading data from the printing application during this time.
Although this may result in a prolonged ZWP state, it would be
premature for TCP to take action on its own and close the printer
connecting merely due to its lack of progress. Once the printer's
paper tray is reloaded (which may be minutes, hours, or days later),
the print job should be able to continue uninterrupted over the same
TCP connection.
In summary, TCP MUST NOT take any action on its own to abort a Systems that adhere too strictly to the above verbiage of [RFC1122]
connection in persist condition. Applications however can request may fall victim to DoS attacks, by not supporting sufficient
that a connection in persist condition be aborted. The resource mechanisms to allow release of system resources tied up by
manager in the operating system when faced with depleted resources connections in the persist condition during times of resource
can also ask TCP to abort a connection. exhaustion. For example, if we take the case of a busy server where
multiple (attacker) clients can advertise a zero window forever (by
reliably acknowledging the ZWPs). This could eventually lead to the
resource exhaustion in the server system. In such cases the
application or operating system would need to take appropriate action
on the TCP connection to reclaim their resources and continue to
persist legitimate connections.
3. Description of Attack The problem is applicable to TCP and TCP derived flow-controlled
transport protocols like SCTP.
If TCP implementations strictly follow RFC 1122 and there is no Clearly, a system should be robust to such attacks and allow
instruction on what to do in persist condition, connections will connections in the persist condition to be aborted in the same way as
encounter an indefinite wait. To illustrate this, consider the case any other connection. Section 4 of this document provides the
where the client application opens a TCP connection with a HTTP requisite clarification, in standards language, to permit such
[RFC2616] server, sends a GET request for a large page and stops resource management
reading the response. This would cause the client TCP to advertise a
zero window to the server. For every large HTTP response, the server
is left holding on to the response data in its send queue. The
amount of response data held will depend on the size of the send
buffer and the advertised window. If the client never reads the data
in its receive queue or clears the persist condition, the server will
continue to hold that data indefinitely. Multiple such TCP
connections stuck in the same scenario on the server would cause
resource depletion resulting in a DoS situation on the server.
Applications on the sender can transfer all the data to the TCP 3. Description of one Simple Attack
socket and subsequently close the socket leaving the connection with
no controlling process, hereby referred to as orphaned connection.
If the application on the receiver refuses to read the data, the
orphaned connection will be left holding the data indefinitely in its
send queue.
To illustrate a potential DoS scenario, consider the case where many
client applications open TCP connection with a HTTP [RFC2616] server,
and each sends a GET request for a large page and stops reading the
response partway through. This causes the client's TCP
implementation to advertise a zero window to the server. For every
large HTTP response, the server is left holding on to the response
data in its sending queue. The amount of response data held will
depend on the size of the send buffer and the advertised window. If
the clients never read the data in their receive queues in order to
clear the persist condition, the server will continue to hold that
data indefinitely. Since there may be a limit to the operating
system kernel memory available for TCP buffers, this may result in
DoS to legitimate connections by locking up the necessary resources.
If the above scenario persists for an extended period of time, it If the above scenario persists for an extended period of time, it
will lead to TCP buffers and connection blocks starvation causing will lead to TCP buffers and connection blocks starvation causing
legitimate existing connections and new connection attempts to fail. legitimate existing connections and new connection attempts to fail.
A clever application might detect such attacks with connections that
are not making progress, and could close these connections. However,
some applications might have transferred all the data to the TCP
socket and subsequently closed the socket leaving the connection with
no controlling process, hereby referred to as orphaned connections.
Such orphaned connections might be left holding the data indefinitely
in their sending queue.
CERT has released an advisory in this regard[VU723308] and is making CERT has released an advisory in this regard[VU723308] and is making
vendors aware of this DoS scenario. vendors aware of this DoS scenario.
4. Clarification Regarding RFC 1122 Requirements Appendix A of this document provides a simple mitigation to such
attacks. More sophisticated attacks are possible which can build on
A consequence of adhering to the above requirement mandated by RFC this vulnerability and may remain effective even when mitigated with
1122 is that multiple TCP receivers advertising a zero window to a the mechanism prescribed in Appendix A of this document.
server could exhaust the connection and buffer resources of the
sender. In such cases, and specially when the receiver is reliably
acknowledging zero window probe, to achieve robustness, the system
should be able to take appropriate action on those TCP connections
and reclaim resources. A possible action could be to terminate the
connection and such an action is in the spirit of RFC 1122.
In order to accomplish this action, TCP MAY provide a feedback 4. Clarification Regarding RFC 1122 Requirements
regarding the persist condition to the application if requested to do
so or the application or the resource manager can query the health of
the TCP connection which would allow it to take the desired action.
All such actions are in complete compliance of RFC 793 and RFC 1122.
5. Conclusion As stated in [RFC1122], a TCP implementation MUST NOT close a
connection merely because it seems to be stuck in the ZWP or persist
condition. Unstated in RFC 1122, but implicit for system robustness,
a TCP implementation MUST allow connections in the ZWP or persist
condition to be closed or aborted by their applications or other
resource management routines in the operating system.
The document addresses the fact that terminating TCP connections In order to provide some level of robustness to DoS attacks, a TCP
stuck in the persist condition does not violate RFC 1122 or RFC 793. implementation MAY provide a feedback regarding the persist condition
It also suggests that TCP must not abort any connection until to the application if requested to do so or an application or other
explicitly requested by the application or the operating system to do resource manager can query the health of the TCP connection allowing
so. The potential implementation guidelines of the request and the it to take the desired action. All such techniques are in complete
action are documented in Section 7, and the details of mitigating the compliance of [RFC0793] and [RFC1122].
DoS attack are left to the implementer.
6. Acknowledgments 5. Acknowledgments
This document was inspired by the recent discussions that took place This document was inspired by the recent discussions that took place
regarding the TCP persist condition issue in the TCPM WG mailing list regarding the TCP persist condition issue in the TCPM WG mailing list
[TCPM]. The outcome of those discussions was to come up with a draft [TCPM]. The outcome of those discussions was to come up with a draft
that would clarify the intentions of the ZWP referred by RFC 1122. that would clarify the intentions of the ZWP referred by RFC 1122.
We would like to thank Mark Allman and David Borman for clarifying We would like to thank Mark Allman, Ted Faber and David Borman for
the objective behind this draft. To Dan Wing, Mark Allman and clarifying the objective behind this draft. To Wesley Eddy for his
extensive editorial comments and to Dan Wing, Mark Allman and
Fernando Gont on providing feedback on the document. Fernando Gont on providing feedback on the document.
7. Programming Considerations 6. Appendix A, Programming Considerations
As a potential implementation guideline, the authors are documenting As a potential implementation guideline, the authors are documenting
some of the programming considerations. This should not be in any some of the programming considerations. This should not be in any
way construed as the only way that the mitigation against the DoS way construed as the only way that the mitigation against the DoS
condition can be achieved. Applications can choose their own condition can be achieved. Applications can choose their own
implementations on how to deal with this DoS sceanrio. implementations on how to deal with this DoS scenario, and should be
aware that this mitigation is only effective at combating the simple
attack scenario described in this document, and does not handle even
slightly more sophisticated attacks based on the same or similar
concepts.
Note, this persist condition is mutually exclusive from a persist
condition where we are not getting zero windows acknowledgement for
the probes.
The technique described here allows an application to specify to the
operating system that it consents to aborting such connections.
Implementers can choose to in addition provide an asynchronous
notification interface to inform the application of the connection in
the persist condition, if they want the application to abort the
connection. In the case where the application has terminated or
orphaned the connection, the TCP or kernel code will go ahead and
clear the connection and reclaim its resources.
The key consideration in putting a solution together is to be able to The key consideration in putting a solution together is to be able to
detect a connection that is in persist condition. The application detect a connection that is in persist condition. The application
through the socket interface can inform TCP or kernel of how long through the socket interface will be able to inform TCP
they are willing to wait in persist condition. When the connection implementation or kernel of how long they are willing to have
reaches that particular timeout value a EPERSISTTIMEOUT notification connections wait in the persist condition.
will be sent to the application. The application on receiving the
notification can turn around and issue a close. In the case, the
application has terminated, TCP or kernel will go ahead and clear the
connection and reclaim the resoruces. Note, this persist condition
is mutually exclusive from a persist condition where we are not
getting zero windows acknowledgement for the probes.
PERSIST_TIMEOUT PERSIST_TIMEOUT
Format: Format:
int setsockopt (sockfd, SOL_TCP, SO_PERSISTTIMEO, int setsockopt (sockfd, SOL_TCP, SO_PERSISTTIMEO,
persist_timeout_value, length) persist_timeout_value, length)
int getsockopt (sockfd, SOL_TCP, SO_PERSISTTIMEO, int getsockopt (sockfd, SOL_TCP, SO_PERSISTTIMEO,
persist_timeout_value, length) persist_timeout_value, length)
where persist_timeout_value recorded in seconds is of type int and where persist_timeout_value recorded in seconds is of type int, the
the length is four. length is set to four.
The above interface allows applications to inform TCP that when the The above interface allows applications to inform TCP what to do when
local connection stays in persist condition it can be aborted after a the local connection stays in the persist condition. Note that the
set time. Note that the default value of this option is infinite. default value of persist_timeout_value is -1 which implies it is
infinite.
TCP sender will save the current time in the connection block when it TCP sender will save the current time in the connection block when it
receives a zero window ACK. This time is referred to as the persist receives a zero window ACK. This time is referred to as the persist
entry time. Thereafter every time the probe timer expires and before entry time. Thereafter every time the probe timer expires and before
it sends another probe or an ACK carrying zero window is received a it sends another probe or an ACK carrying zero window is received a
check will be done to see how long the connection has been in persist check will be done to see how long the connection has been in persist
condition by comparing the current time to the persist entry time. condition by comparing the current time to the persist entry time.
If the timeout has been exceeded, the connection will be aborted. If the timeout has been exceeded, the connection will be aborted.
Any time a ACK is received that advertises a non-zero window, the Any time a ACK is received that advertises a non-zero window, the
persist entry time is cleared to take the connection out of persist persist entry time is cleared to take the connection out of the
condition. persist condition.
8. Informative References 7. Informative References
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7, [RFC0793] Postel, J., "Transmission Control Protocol", STD 7,
RFC 793, September 1981. RFC 793, September 1981.
[RFC1122] Braden, R., "Requirements for Internet Hosts - [RFC1122] Braden, R., "Requirements for Internet Hosts -
Communication Layers", STD 3, RFC 1122, October 1989. Communication Layers", STD 3, RFC 1122, October 1989.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
[TCPM] TCPM, "IETF TCPM Working Group and mailing list [TCPM] TCPM, "IETF TCPM Working Group and mailing list
http://www.ietf.org/html.charters/tcpm-charter.html". http://www.ietf.org/html.charters/tcpm.charter.html".
[VU723308] [VU723308]
Manion, "Vulnerability in Web Servers Manion, "Vulnerability in Web Servers
http://www.kb.cert.org/vuls/id/723308", July 2009. http://www.kb.cert.org/vuls/id/723308", July 2009.
Authors' Addresses Authors' Addresses
Murali Bashyam Murali Bashyam
Ocarina Networks, Inc Ocarina Networks, Inc
42 Airport parkway 42 Airport Parkway
San Jose, CA 95110 San Jose, CA 95110
USA USA
Phone: +1 (408) 512-2966 Phone: +1 (408) 512-2966
Email: mbashyam@ocarinanetworks.com Email: mbashyam@ocarinanetworks.com
Mahesh Jethanandani Mahesh Jethanandani
Cisco Systems Cisco
170 Tasman Drive 170 Tasman Drive
San Jose, CA 95134 San Jose, CA 95134
USA USA
Phone: +1 (408) 527-8230 Phone: +1 (408) 527-8230
Email: mahesh@cisco.com Email: mahesh@cisco.com
Anantha Ramaiah Anantha Ramaiah
Cisco Systems Cisco
170 Tasman Drive 170 Tasman Drive
San Jose, CA 95134 San Jose, CA 95134
USA USA
Phone: +1 (408) 525-6486 Phone: +1 (408) 525-6486
Email: ananth@cisco.com Email: ananth@cisco.com
 End of changes. 37 change blocks. 
123 lines changed or deleted 143 lines changed or added

This html diff was produced by rfcdiff 1.40. The latest version is available from http://tools.ietf.org/tools/rfcdiff/