Managing an up/down WAN connection seamlessly

Greetings-

NOOB question here. I’ve got a peplink balance 30 with a Comcast WAN connection and a Verizon 4G modem as the failover. This morning comcast was having service issues where the connection has been up and down all morning, every few minutes, sometimes up for a minute then down. For the end users, the VOIP phones have been up and down, streaming radio has been inconsistent, and the internet has felt unpredictable (despite a good 15M connection on the modem).

I’m assuming there’s some way to configure the router to perform better in a situation like this? We actually just ended up unplugging the WAN connection to let things settle out on the 4G connection.

Many thanks in advance!

Cheers,
Ryan

Hi Ryan-

The default health check setting is DNS Lookup, and this setting along with its parameters are configurable. Here is a link to more information:
http://www.peplink.com/knowledgebase/health-check-mechanisms-against-link-failure/

Thanks Tim-

I’ve got that much configured. I was more looking for best practices given that the defaults were causing sketchy behavior when the primary WAN connection was up and down with relatively high frequency. My sense is that the right answer is to perhaps decrease the number of health retries (3 * 5 secs is 15 seconds of downtime before a failover) and perhaps decrease the number of recovery retries, but was curious to get the wisdom of others.

The Peplink will do pings or lookups, and wait for x failed / good sequential attempts to change state. That decision process is OK for a complete loss of a WAN, but not effective for a half dead connection. example: A WAN that has a persistent 20% packet loss will both fail, and pass. It makes for a bad quality connection when its used though. Unfortunately, the quality of a connection needs a better test and interpretation, and the Peplink can’t do that.

rossh

Actually, for this scenario you would INCREASE the number of recovery retries so the Balance is less likely to bring up an unstable WAN link once it is determined to be down.

This might be me misunderstanding the retry logic, so correct me if I’m wrong. But if you’re experiencing 50% packet loss, that means for any given retry you have a 1/2 chance of getting a good connection. If the retry logic is additive - i.e. for all five retries you need to get a positive connection each time, then that makes sense (going from a 1/2 chance of bringing up a bad connection to 1/32 chance of bringing up a bad connection). But if the retry logic is independent - keep trying until you get a good connection and then bring it up, that seems as though you’d actually increase your chance of finding a packet that goes through and thus you bring up the WAN connection again.

Ron is correct, you would want to increase the number of recovery retries in your case. They are additive or consecutive, meaning a setting of 10 would require 10 consecutive health check passes before bringing the connection back up.

The health retries setting is the number of consecutive failures needed before treating a connection as down. In your case you could lower this number from the default of 5, the most aggressive being 1.

You should be able to fine tune these settings for this specific connection to suit your needs.

I would also complain to Comcast that their connection is lousy, you are a paying customer and this level of service seems unacceptable. Maybe they need to arrange a service call to see what is going on.

Very helpful Tim. Thanks much!

As to Comcast, it’s generally good. This is the first time in 7 months it’s gone down in a major way, and it went down in a way that circumvented my best effort at minimizing the impact. From this thread, I think I’ve got a good sense of how to correct for the future. I’d say it will probably never happen again…but…it’s comcast :wink:

Sounds good Ryan, thanks for the update.

This might work when the packet loss is at a perfectly consistent rate, but it rarely is. There is normally some erratic behavior to packet loss, which will out trick the longer test count.

In addition, if the packet loss increases with the network under any load, then the WAN will come back on,and it will fail again as the packet loss goes up.

In practice the logic is not good enough to see the connection is unreliable. The Peplink will keep up an off / on cycle, which will be frustrating to use.

Rossh