Proxy Failover Advice


Overview

I’m currently deploying a pair of HA firewall devices that will act as a transparent forward proxy (traffic will be directed through the proxy via routing rather than configuring a proxy URL on the client machines)for outbound traffic. I have the high availability configuration in place and working and I can see that the TCP sessions are shared across both devices. When failover is triggered, the passive device will assume the IP addresses (actually the whole network adaptor is moved across as this is on AWS) of the previously active instance.

Issue

As a test, I setup a web server and created a large html file. I then used a client machine to retrieve this file using wget and curl (via my proxies) and during the file download I performed a manual failover. When I performed the failover the wget (same happened with curl) download got stuck. I then added connection timeouts and the wget command timed out and then restarted the download which worked fine although I could see that a new TCP session was created. One thing to note is that this is a Cloud setup where failover times are a lot slower than on-premise high-spec devices so it can take between 15 – 60 seconds for failover to complete. I’m trying to ensure that my deployment will not impact highly on applications which will mainly be sending HTTP traffic.

Questions

  1. Is it reasonable to expect a HTTP download to continue after a failover or should the client use timeouts and retries to start the download again?

  2. Am I likely to need application teams to change their timeout and retry settings? What is considered normal for timeout and retry settings for applications that regularly send API requests?

  3. Is TCP session sharing only really suitable for pure TCP connections (database connections etc.) and is there any real use for HTTP connections?