Internet Fails on 1 WAN and stops internet access

Hi Everyone,

I have a Balance 310X that I have been very happy with for the past week. I have 2 WAN connections coming from ADSL Modems, both first priority and a backup cellular Sim card if they go down.
I setup SFC and tunnelled traffic to it via the outbound policy. All working fine.

I initially tested the hot failover by pulling the ethernet cables from front of the unit - again all worked perfectly.

So during my first live event all of a sudden I have no internet connection to any computer connected to the 310X . I log into the admin page to find both WANS are ‘connected’ and my cellular is on ‘standby’. Everything looks normal.

I soon notice that my router (Vodafone UK) supplying connection to my WAN 2 has no incoming internet connection, it seems the internet may have been disconnected from the cabinet by an engineer. This however causes the 310X to deny internet access to all devices, despite still having WAN 1 working perfectly and the backup sim ready to go if the connections fail.

It seems to me that this shouldn’t be the case - it basically means the 310X is completely useless as a hot failover device if the internet fault is before the ADSL router. I know my router throws up a sign in page hat informs that there is no connection, could this be throwing the peplink unit off?

I have changed the outbound policies and even tried without the pre-set ‘HTTPS persistence’ policy but it remains the same - routing through SPF or not there is no internet access whenever the router with the faulty line is connected. The moment I unplug the offending WAN 2 I get internet access back access via the WAN 1 connection.

Another thing to note is if I restart the router with the faulty line connected, the 310X has trouble establishing a connection to the speed fusion cloud - often never connecting at all.

Anyone have any ideas? Is there something Keep any access in this situation?

Thanks!

Are you sure the SFC settings have all WANs in Priority one? Do a screen grab of the Status > Speedfusion view.

I have Virgin Media Cable here and when it fails my Balance One detects that fine using both DNS healthcheck and my usual HTTP healthcheck.

However when using SpeedFusion, that link failure should be detected and the WAN bypassed automatically anyway - even if the link has not been marked as bad on the dashboard, since SpeedFusion monitors WAN link health independently anyway…

Thanks Martin,

Yes they are both priority one in SPF, and both always-on in the network settings

It seems to be really random, I can disconnect the wan2 via the dashboard and the internet jumps back into life, this is for all computers connected to the Balance. I then enable it again and I may have connection for a couple of minutes, and then I lose it again. This is both when using SFC and when using other policies.

I can see the SFC cloud can see there is no connection on Wan 2, the dashboards still always show as connected. I feel it must be something to do with the Vodafone router. What is also strange is that I can’t connect to SFC if I reboot the router with the dodgy internet line connected.

I’m hoping I may have now solved it. I have changed the Health check DNS server and it now recognises my downed connection and doesn’t pass traffic through it.

I will test again when my second line gets fixed!

Thanks

1 Like

In your status screenshot above SFC is not using WAN2 - so has detected it has an issue as you said.
As such traffic going via SFC will pass via WAN1 and should still work.

I would double check your outbound policies and the SFC settings to make sure that the device on the LAN that you are testing with is actually using SFC… do a whatismyip.com and see if it is one of your WAN links or not…

Hi Martin,

Thanks again for your reply. I’ve checked my SFC settings and outbound policies and everything seems to add up. When connected to SFC via my outbound policy my public IP show the SFC server, when not routing through SFC my IP is my routers public IP. Again everything seems normal!

Changing my Health check DNS to 1.1.1.1 seems to have given me internet access back, but I’m pretty sure it should have detected the downed connection without having to do that and I think the peplink should be able to connect to SFC with the downed WAN anyway.

Curious stuff! Here are a few screen shots of my SFC tunnel and outbound policy, in case I’ve been an idiot, which isn’t entirely uncommon! Just a very simple SFC tunnel, no FEC or smoothing, WAN 1 & 2 high priority with Cellular and USB secondary. Also a screenshot of an outbound policy to steer all traffic through it

Just a thought, since you only allow traffic through the “Cloud: SFC”, does the router think that the SFC is “unhealthy” when one of the links is being re-established? The way I understand the Priority policy is that “traffic will route through the highest priority link in a healthy state”. If the router deems that tunnel to be “unhealthy” - there is no other option to route the traffic. If it were me, I would put WAN1 and WAN2 in the list underneath the SFC. You may want to try to use the “Enforced” outbound policy for the way you have things configured. That policy may completely disregard the “health” of the tunnel completely. If the tunnel has any links that are established – your traffic should make it to the cloud endpoint.

FWIW, my experiences have shown that the ping health check works the best. I use my default gateway as the ping target. I have static IPs, so it is always the same. I don’t know of a way to implement it when a dynamic IP is being used.

Hi there,

Thanks for the input. I did actually have the WAN 1 & 2 lower in the priority list under the SFC when the problem first arose. I then tried multiple combinations of policies without resolving the situation. I figured I would factory reset the router and then setup the simplest setup of SFC and outbound policy in order to post here for any suggestions.

Great to know about your preference for the default gateway ping health check, I may well test that out. But so far the setting a specific DNS for the health check has solved the problem entirely, and now the offending ADSL line is fixed I can mess around with combinations of test failure’s and hopefully find out which setup is best and not have heart palpitations for my next live stream.

If I find anything in particular I’ll post my findings here in case anyone has similar issues.

Stick with it. Those that have figured out the SFC stuff really like it. Glad to hear that you got you have the health check issue resolved. Good luck with your networking adventure.