Outbound Policy - priority order doesn't seem to work

Xerxes · March 2, 2022, 5:51pm

I have three WANs on my Peplink One.

For my Outbound Policy, I have various rules set depending on MAC address. Most of them are set to Priority, and then I have the order of the three WANs set per MAC.

One of my providers is having an outage today. Going on 5 hours. All of the devices that have the provider that is down, in slot #1 priority, have had no internet for the last 5 hours. They are not falling through and using #2 or even #3.

I tried updating the firmware from 8.1.3 to 8.2.0, but that did not help. Is Priority order broken? Why isn’t it working like I think it should?

Xerxes · March 2, 2022, 10:18pm

I have never used that screen. It’s not really helpful. I would rather it show which WAN is in use for each connection on the Client List screen…

That said, I have never had Priority option working on any firmware version. I always have had to use Enforced, but that doesn’t work so well when they went down… it was recommended to me to change to Priority, but today’s outage reminded me why I set them to Enforced to begin with… Priority just doesn’t work.

Xerxes · March 2, 2022, 10:40pm

I’m not so sure about that. I originally had it set up to use IP address and then set static leases for each device. However, I like to group my devices by chunks or blocks of IPs. This required reshuffling as I expanded or removed devices and then having to go in and update all of those IPs. Priority did not work when I was using IPs either…

It’s only been set to use MAC for the last 6 weeks or so, which solves my previously mentioned issue… I’ve always had to manage which connection each device is using, going back to firmware 7.x. And then remember to go change them all back when the connection came back up.

C_Metz · March 2, 2022, 10:48pm

I’m sorry I couldn’t help you. I’ve deleted my screenshots above for security reasons.

Xerxes · March 4, 2022, 6:37pm

Does anyone know why priority does seem to work for me? At a loss of what to try.

C_Metz · March 4, 2022, 7:36pm

Reading this statement, that usually means you have priorities set on your dashboard page which override any priority you have set in outbound policy. All WANs need to be priority #1 for outbound policy to have it’s priorities working. Screenshot below.

Xerxes · March 4, 2022, 7:41pm

My dashboard status page does not look like that:

If I go into each WAN, there is not an option to order them based on priority. They are either always on or backup. All are set to always on (priority 1).

C_Metz · March 4, 2022, 8:11pm

The next thing I can think of to check is… rules are matched from the top down in the rule base (outbound policy). So if traffic matches rule #1 in your policy, but your priority traffic rule is rule #3. The traffic session will only obey the first rule it matches on. You might check your outbound policy for a high up rule that the traffic is matching on before it gets to the priority rule you are wanting it to use.

Rick-DC · March 5, 2022, 11:34am

Hi Tom. “Failure to failover” smells like an issue with Outbound Policies – as @C_Metz suggested. Can you show us what those setting look like, particularly the last one which is likely to be “'Default?” Please expand that one so we can see what your settings are.

erickufrin · March 5, 2022, 9:39pm

@Xerxes Can you confirm that a Health Check is configured on the wan in question?

If no health check is set then there would be no failover when traffic is not passing.

Xerxes · March 6, 2022, 12:34am

I have about 50 outbound policy rules. All are using MAC address, with the exception of a few. The first couple of rules are there to help the various computers to get the router/modem config page, regardless of what connection they are running off of. The second to last one is HTTP persistence and the last is default.

Health checks are configured on all three WANs. Using DNS lookup. However this has me confused on how this works. From which connection is it doing the lookup? During the last outage, it showed a WAN as red with failed DNS lookup, so connections should have failed over, but didn’t.

The Balance One is is acting as the switch for all 3 connections (Balance 30 LTE - cellular, Balance One - ethernet, Nighthawk - cellular). WAN 1/2/3 is using DNS Servers 8.8.8.8, 4.4.4.4. If the BalanceOne is doing the DNS health check for WAN2, which is on the Balance 30 LTE, the Balance One may have an internet connection, so maybe that’s part of the problem? Unless it’s querying the DNS servers from the IP address assigned as a client on the WAN itself?

C_Metz · March 6, 2022, 4:33am

I just want to verify that the Balance One is currently acting as the central router on the network, NAT routing clients from address range ?.?.?.? to the 10.0.1.1, 10.0.2.1 and 192.168.100.1 respectively.

To answer your DNS query question, the queries would come from 10.0.1.2, 10.0.2.2, and 192.168.100.?

Xerxes · March 6, 2022, 5:23am

This is almost spot on. The Spectrum modem is in bridge mode, so the Balance One is 10.0.0.1 and is responsible for DHCP on the network and hands out 10.0.0.* addresses to all clients. The Balance One then figures out which route to send the clients to.

C_Metz · March 6, 2022, 5:46am

Cool. Maybe @erickufrin and @Rick-DC have some other things to check, but as long as all those 10.0.X.X address ranges are /24 (255.255.255.0), then when a particular WAN is Red / marked down on the dashboard it should failover to the next connection. Those last 2 rules in your policy both make things use whichever connection has the lowest latency, but since they are at the bottom, they would only be used as a last resort and still wouldn’t cause traffic to route to a Down/Red WAN connection. Here’s the updated diagram.

Xerxes · March 6, 2022, 6:21am

Awesome, thanks for drawing that. It looks correct.

My hack has been to use Enforced, since Priority has never done anything useful for me.

Here are the health checks:

erickufrin · March 6, 2022, 6:57am

Which WAN was down during the outage?

The 2nd rule in your outbound policy looks odd as the destination is the local ip of the peplink but your policy is sending that outbound. Why is that rule there?

I would also uncheck “obtain dns server address automatically” on the spectrum.

Xerxes · March 6, 2022, 7:08am

The Verizon WAN is the one that went down and nothing failed over that had Verizon as priority. I had to switch each rule to enforced and choose Spectrum or AT&T to restore service.

Those top four rules were originally there so I could access each device’s admin page. I think that is legacy stuff. If each device has a 10.0.0.* IP, will they all be able to access the other modem admin pages without those outbound policy rules? If it will all still work, I can remove those four rules.

I will uncheck the auto DNS checkbox on the Spectrum.

erickufrin · March 6, 2022, 7:27am

Rule 1, 3, 4 are ok.

Rule 2 should be removed. I’m not certain rule 2 would cause this issue but it just doesnt make sense as the rule as written should cause traffic destined for the LAN-IP of the BalanceOne to be sent out Verizon. Peplink must have some sort of protection built-in to prevent a lockout cuz I’d expect sending traffic destined for the LAN-IP of the device to another WAN should cause the management webpage to become unreachable. I cant imagine a scenario where this setup makes sense for you base on the screencaps/diagrams above. If the IP range of any WAN were actually overlapping, I would change the LAN-IP/LAN-subnet-IPrange to not conflict. Since your Verizon WAN doesnt actually use 10.0.0.0/24 you would be routing 10.0.0.1 traffic into a blackhole.

Xerxes · March 6, 2022, 4:36pm

Ok, done. I removed that rule. Yeah, I’m not sure why it was there. It was probably left over from a few years ago.