Add a priority option to not fall back on link recovery


#1

When a link fails over to a secondary link we would like all sessions to stay on that secondary link including new sessions. Even when the primary link comes back online. We would like to manualy fail back to the primary circuit after hours or at some scheduled time.
The reason for this is we have seen flapping on the primary circuit which causes the peplink to flap between wan1 and wan2. If the failover would just stay on the secondary circuit the problem of flapping would be solved.

Thanks,

Troy


#2

Thanks Troy, Peplink shouldn’t flap back to primary link if we doesn’t check “Terminate Sessions on Link Recovery” under the outbound policy. Do we have it checked?


#3

We understand how “Terminate Sessions on Link Recovery” works. But with it checked or unchecked we can still have issues where the peplink cycles between WAN1 and WAN2 if WAN1 is flapping.

Lets assume WAN1 is having circuit issues. It goes online for a minute then offline for a minute in a constant cycle. The Peplink in Priority mode will see WAN1 go offline and point traffic to WAN2. A minute later it wil see WAN1 is online and will route all new traffic out WAN1. A minute later WAN1 goes offline again all the users loose their sessions. The peplink starts sending traffic out WAN2. This cycle is what I would like to protect from.

I would like a option in the Priorty settings for “manual fail back”. If that setting is checked. When the peplink fails over to WAN2 it stays there. All traffic flows through WAN2 until we manualy login and set the policy to Enforced WAN1 to move the traffic back. Then we set the policy back to Priority to reset the failover capabilities.

Troy


#4

I see where you are coming from, Troy. But application could be limited for this feature. I mean if WAN1 is flapping that much then I would imagine WAN2 should be assigned a higher priority at least for the time being. Don’t you agree? :confused:


#5

The point of my “failover and stay” option. Was that it was an automated method of protecting us from flapping. Changing the priority would require us to know about the issue and make the change. All during that time the user would be having connectivity issues.

Troy


#6

Got you, Troy. We will take a closer look at this. Thanks for bringing this up.


#7

Kurt,

Maybe this should be a new Load Balancing Algorithm called Failover instead of a option under Priority. The Failover option will failover and stay until manualy failed back.

Here is another example of why I requested it.

We have a VoIP ALG device that sits behind the Peplink. The VoIP ALG device does MOS scores and other VoIP specific tasks that the Peplink can not do. So we need this device for our hosted VoIP system.

The VoIP ALG device created a session with the PBX on the internet. This sesion never times out becuase it has a 5 second heartbeat. The VoIP phones only have sessions when calls are made and they do expire. Here is the issue we have with Priority and the reason I am requesting Failover and stay.

WAN1 is up. The VoIP ALG has a session to the Internet PBX. Phones are using WAN1. WAN1 goes down. All calls are terminated. We have 2 options moving to WAN2 I will describe how they impact the users.

Option 1. Prority with “Terminate Session on recovery” turned OFF.
The VoIP ALG starts a session on WAN2. The phones start using WAN2. When WAN1 comes back on line. The Phones who’s sessions expire will start to use WAN1. However the VoIP ALG devices session does not expire so it stays on WAN2. Now we have part of our phone system on WAN1 and the other part on WAN2. The Internet PBX does not like this can all inbound calls do not work. If we had my failover and stay feature. The entire phone system would stay on WAN2 and inbound calls would work. After hours we could manualy switch them back to WAN1 so they would not loose any calls.

Option 2. Prority with “Terminate Session on recovery” turned ON.
This sounds like a good solution because it forces all traffic back to WAN1. The problem is that when WAN1 comes back on line it terminates all the sessions on WAN2. This is fine for computer Intetnet sessions but very bad for VoIP. With VoIP all the current calls will be disconnected. So the users get disconnected calls twice, the first time when WAN1 fails. This is understandable and acceptable. But then they get their calls disconnected again when WAN1 comes back on line. This second outage is not acceptable. We wanted to use the Peplink device for redundancy and to limit the outages they have. If we had a Failover and stay option we would not have that second VoIP outage.

Troy


#8

Thanks for your elaboration, Troy.

One immediate solution to this is to configure your PBX to accept IP change. We use Polycom phones with Asterisk and this is quite easily done. Maybe this could be an immediate solution for you too.

We will mark a feature request for a “failover and stay” load balance algorithm. This could possibly be a dynamic priority rule that will adjust link priority due to a link down. We will look into this.