Balance 20 - Load Balancing Broken?

Tim:

Can it be that maybe the wan that is not receiving traffic is offline but the health check cant detected because is still getting an ok from router cache?

I have some customers that complains that sometimes isnt balancing and when I review the unit the wan is down because lack of payment and the isp is responding the pings and dns checks with a redirect to a portal that say: “please pay”.

So what I propose is this.

When this happens, put a machine on a lan and make an enforced rule that say: go to the wan that is not being used, and try to send some traffic (speedtest) and see what happens.

For example if my lan is 192.168.1.58

AG

The issue as I reported was a “hanging connection” process on the ISP’s side. Once all those processes were cleared, the router connected and resumed normal operation.

When the router tried to connect to the WAN using PPoE, it went through a series of “failed login userid/password” before claiming to connect.

There was nothing in the logs to indicate that there was an issue, so when I investigated to try and figure out what was going on, there was nothing on the router to lead me to the issue.

I see…

what’s health check you use?

Use ping to a known server ( google.com, 8.8.8.8, 8.8.4.4 ) or dns to opendns or google ( 208.67.222.222, 8.8.8.8, 8.8.4.4 ), increase the timeout ( default 5, increase to 10 ).

I’m glad your problem is finally resolved, and I agree with you that sometimes the log need to increase the verbosity on certain wans (dedicated or mpls ) or reduce to 0 in others ( 3g or cheap dsl ) so when we got an alarm is not about System: Time synchronization successful or other msg like those.

AG

On the dashboard, there’s a “connection status” for each WAN. I was using that, in combination with the Real Time bandwidth usage logs and some multi-connection D/L test runs to see what was working and what wasn’t.

Ok, I see…

In Network Wan, in the options where you configure Wan ( dhcp, pppoe ) there is an option called health check.


That’s what I’m talking about…

Normal usage is to make a health check against isp dns providers ( Use first two DNS servers as Health Check DNS Servers ), but since the isp is not connected to internet but resolving dns querys then the peplink thinks the link is ok when is not.

There isnt a golden rule for all isp, some gave you ip but doesnt forward traffic, other reduce the bandwidth, other stop giving you dhcp or some more fancy forward to the billing department, so you have to see what’s the best aproach for this values for each isp you use.

The one that never fail is the http, where you point the wan to a server of your own and send request a secret url with a token like http://peplink.mywebprovider.com/secretstuff/index.html and it get you htlm page that just say “peplinkrocks” and use the matching string “peplinkrocks” as health check, but It had happen, that the web provider at mywebprovider.com fails and then all the peplinks think that all there wans are down and bad things happens…

My advice is to use google as dns ( as my example ), but keep an open mind for some other isp exceptions.

AG

Hi Tim,

Do open a support ticket here, if you still encounter the issue. Peplink support will help you to further check on the issue.