Feature request - Intermittent Failure Detection

Balance One. We had one of our two internet providers go into an intermittent failure mode. Our logs showed the Internet working for a minute or two, then down for 30 seconds. Repeat for 8+ hours.
While the Balance One worked flawlessly at the detection and fail-over (and the PepVPN handled every failure), the Internet access at the business was unusable!

I’m speculating that half of the secure connections failed when the router switched to the remaining good connection. (SSH/HTTPS Outgoing set to Persistence)…

While the Balance One worked perfectly, the result was unsatisfactory.

Can we have some settings that counts the number of disconnects per a defined time period, if the disconnect exceed a threshold, then down the link for a specified time. ex; if we have 5 or more disconnects in 20 minutes, down the link for 2 hours.

Roger

2 Likes

SpeedFusion has a suspension after packet loss setting but this is more for milliseconds and not minutes (no good one a Balance ONE unless you are actually bonding in which case the above issue shouldn’t be an issue).

Would be a nice nice feature to include with normal load balancing.

Im assuming it would be included under the “health check” options of the WAN and would be three option;
-Treat this WAN as down if it fails more than X times in X minutes.
-Allow this WAN to be utilised after X minutes of uptime

Thanks

1 Like

Fine tuning WAN health check should able to help on the issue.

Please check the WAN health check parameters as explained in the forum link below:

You can actually set the recovery retries to higher number to make sure the WAN is stable before sending traffics to the WAN.

Recovery Retries: This specified the number of successful health checks a failed links needs before the link is considered as up again.

1 Like

ok. almost as good. but… I think the detection of a down link should be fast, and now with my new knowledge of link failure behavior, the re-establish link should be long. but the options presented in the Recovery Retires is limited to 20 times the Check Interval. For my current settings, I Check every 5 seconds, 3 failures, then the link down (15 second to failure detection). 5 second times 20 retries is 100 seconds. less then 2 minutes. I’d like the restore link long, but keep the detection at 15 seconds.

2 Likes

This is what we would like as well. Any ideas?