High availability disappointing (and dangerous)

I’m quite disappointed in the high availability feature, and turning it on has killed my cable modem.

My justification for buying a pair of Balance 20X was the high availability feature, but it doesn’t have the features I needed. The implementation is disappointing. Also for reasons which are totally inobvious, it’s killed my cable modem.

My network, with the Balance 20X pair, looks something like this:

Which looks remarkably similar to the Sample Configuration 2 in RFC5798 RFC 5798 - Virtual Router Redundancy Protocol (VRRP) Version 3 for IPv4 and IPv6 (the one which defines VRRP). I’d like to use VRRP (the high availability feature) to implement this.

Only the Peplink VRRP implementation doesn’t have the features necessary to implement this. The Peplink implementation only allows a device to be either a master, or a backup, to only one network. The configuration I want needs each device to be a master router for one network, and a backup router for a second network. (Just like the sample configuration.)

This limitation wasn’t obvious from the documentation alone, I had hoped that the high availability tab might be able to have multiple profiles, like a lot of other Peplink features.

It’s not clear if the high availability actually worked, because as I said, turning it on killed my cable modem. Cable modems with Comcast are locked to a single Mac address, so I did wonder if the WAN port had started using a virtual MAC address. I tried a packet capture on the WAN port while high availability was turned on, and no packets were captured on that port. It gave me a capture of the LAN port instead, which is puzzling.

The WAN port status is showing a private IP address in the same range as the modem’s admin interface, it usually shows the public IP of the cable. The WAN is also showing a status of down, due to the DNS test failing. This looks like the modem is not passing traffic from the router (but I couldn’t confirm this as the packet capture was behaving wackily).

The first time I tried high availability, the modem recovered after a power cycle. This second time, the modem did not recover. It’s now unplugged in the hope it’ll get better with some downtime.

After being unplugged for an hour, the modem seems to be back to normal.

I worked out why it’s killing the cable modem, it’s changing the ethernet MAC address on the WAN port. Comcast modems are sensitive to that, and can go into a snit which takes half an hour or more for it to recover.

What’s worse is the MAC address used is an IANA MAC address which is currently unassigned (ie not Poplin’s and potentially conflicting with future features). It’s using 00:00:5e:01:01:fa, where 00:00:5e:00:01:fa is the virtual MAC address the VRRP should be using for the high availability feature.

00:00:5e is IANA’s allocation, and the relevant document gives 00-54-00 to 90-00-FF Unassigned

Here’s a sample DHCP packet from the Balance

13:17:08.035050 10:56:ca:6e:33:xx > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 343: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 10:56:ca:6e:33:e1, length 301
0x0000: ffff ffff ffff 1056 ca6e 33xx 0800 4500
0x0010: 0149 0000 0000 4011 79a5 0000 0000 ffff
0x0020: ffff 0044 0043 0135 3ad2 0101 0600 4756
0x0030: 7c2e 0b08 0000 0000 0000 0000 0000 0000
0x0040: 0000 0000 0000 1056 ca6e 33e1 0000 0000
0x0050: 0000 0000 0000 0000 0000 0000 0000 0000
0x0060: 0000 0000 0000 0000 0000 0000 0000 0000
0x0070: 0000 0000 0000 0000 0000 0000 0000 0000
0x0080: 0000 0000 0000 0000 0000 0000 0000 0000
0x0090: 0000 0000 0000 0000 0000 0000 0000 0000
0x00a0: 0000 0000 0000 0000 0000 0000 0000 0000
0x00b0: 0000 0000 0000 0000 0000 0000 0000 0000
0x00c0: 0000 0000 0000 0000 0000 0000 0000 0000
0x00d0: 0000 0000 0000 0000 0000 0000 0000 0000
0x00e0: 0000 0000 0000 0000 0000 0000 0000 0000
0x00f0: 0000 0000 0000 0000 0000 0000 0000 0000
0x0100: 0000 0000 0000 0000 0000 0000 0000 0000
0x0110: 0000 0000 0000 6382 5363 3501 033d 0701
0x0120: 1056 ca6e 33xx 3204 c0a8 640a 3604 c0a8
0x0130: 6401 3707 0103 060c 0f1c 2a3c 0c75 6468
0x0140: 6370 2031 2e33 332e 310c 0b54 7779 6372
0x0150: 6f73 734a 616d ff

And when HA is turned on, it tries this for DHCP:

17:08:55.279617 00:00:5e:01:01:fa > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 342: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:00:5e:01:01:fa, length 300
0x0000: ffff ffff ffff 0000 5e01 01fa 0800 4500
0x0010: 0148 0000 0000 4011 79a6 0000 0000 ffff
0x0020: ffff 0044 0043 0134 536a 0101 0600 9ba7
0x0030: 2b0a 0000 0000 0000 0000 0000 0000 0000
0x0040: 0000 0000 0000 0000 5e01 01fa 0000 0000
0x0050: 0000 0000 0000 0000 0000 0000 0000 0000
0x0060: 0000 0000 0000 0000 0000 0000 0000 0000
0x0070: 0000 0000 0000 0000 0000 0000 0000 0000
0x0080: 0000 0000 0000 0000 0000 0000 0000 0000
0x0090: 0000 0000 0000 0000 0000 0000 0000 0000
0x00a0: 0000 0000 0000 0000 0000 0000 0000 0000
0x00b0: 0000 0000 0000 0000 0000 0000 0000 0000
0x00c0: 0000 0000 0000 0000 0000 0000 0000 0000
0x00d0: 0000 0000 0000 0000 0000 0000 0000 0000
0x00e0: 0000 0000 0000 0000 0000 0000 0000 0000
0x00f0: 0000 0000 0000 0000 0000 0000 0000 0000
0x0100: 0000 0000 0000 0000 0000 0000 0000 0000
0x0110: 0000 0000 0000 6382 5363 3501 013d 0701
0x0120: 0000 5e01 01fa 3707 0103 060c 0f1c 2a3c
0x0130: 0c75 6468 6370 2031 2e33 332e 310c 0b54
0x0140: 7779 6372 6f73 734a 616d ff00 0000 0000
0x0150: 0000 0000 0000

You’ve set it up incorrectly. The VRRP implementation Peplink is using, is only doing VRRP on the LAN side. That way you don’t need multiple WAN IP’s for a HA setup. It only support a Master/Slave setup, so only Active/Standby.

The whole config is being copied from Router1 to Router2. Router1 and 2 will have the same config so only one of them will be active at a time. You can overwrite a MAC-address on the WAN interface so it will copy that MAC-address to the second router when it fails-over. You can even use the original WAN MAC-address of the first router to do that. Router1 should have the configuration of all VLANs on the LAN side.

When ping on LAN fails between the two the slave will become master and the master will become slave.

Works fine for us with two connections linked to two Balance 20X of which both connections are using DHCP to provide the public IP. A setup similar to this:
image.png

For our ISP’s it is important to have the simple switches in between the ISP modem and the WAN connection of the router. As our ISP’s only provide one active ethernet connection per modem/FTU. If you ISP provides a bridged modem that does support multiple ethernet links you might connect the modem directly to both routers WAN.

We use a USB-Ethernet adapter for the second WAN link on the 20X as we need PPPoE on the DSL connection. Otherwise you could use a VWAN for the second WAN connection.

4 Likes

You’ve set it up incorrectly.

That’s a matter of opinion, and the guys who wrote the standard, thought it was a good idea, and so do I.

I worked out how to stop it changing the MAC address, if you use the “MAC Address Clone” feature in the WAN interface, you can set that to the MAC address it would be using anyway, then it’ll continue t use that address after turning on HA.

I currently managed to turn on HA on the master unit without everything going to hell. That’s a step forward.

At the very least this needs to be documented.

It is already documented how to setup HA with Peplink:

https://manual.peplink.com/documentation/peplink-balance-and-mediafast-firmware-manual/ch12-network-tab/misc-settings/high-availability/

5 Likes

Yup. And, in our experience it absolutely works “as advertised.” The instructions are simple and easy to implement. :<)

3 Likes

I did all that.

It doesn’t document that it’ll change the WAN MAC address. That’s unnecessary, and uses unassigned addresses.

It doesn’t document that it’ll turn off the WiFi AP. I’m paying for a router, and access point, I want to use them.

It could also go a lot further, the routers work with multiple VLANs, why not he high availability feature?

And another “quirk”, as noted above. The high availability feature disables the WiFi access point. This is not documented, and is totally unnecessary. I could be using the HA feature, as crippled as it is, it could be useful as far as it goes, but turning off the AP (unannounced) left a big hole in my WiFi coverage.

You are still not getting how the HA works as implemented by Peplink. It is active/backup, not active/active. The second unit is not doing much when in backup/slave mode, this is by design.

Your master needs to have all the configurations and you need to enable sync on the slave so all config will be synced from the first to the second. It only needs VRRP in the native VLAN as it will copy all config including VLANs and Wifi to the second router.

We have 5 active VLANs configured on the Master and have the Wifi AP enabled. When the master fails all those tasks will be taken over by the Slave as that one will become the Master. So all VLANs and Wifi AP will be run by the second router in that instance.

4 Likes

Yes, that’s what make it disappointing, and dangerous.

The behavior is not documented. The behavior is not as I’d expect after reading the VRRP RFC, and the Peplink documentation. The current implementation is severely crippled, and this had already been turned into a feature request.

If Peplink had been honest about what the feature actually does, I might not have bothered to buy the units in the first place.

Also, I have no idea how you manage to back up 5 VLANs, when the UI only admits to there being one LAN. But I don’t really care about those details.

Just a quick sanity check to make sure I understand what you are doing:

In your HA pair, do you have the slave being enabled for automatic configuration sync to the master? That should (and in my experience does) copy pretty much everything from the master configuration whenever that is updated. VLANS and all…

That it does not copy the MAC address is reasonable, and the fact that your ISP locks to a particular MAC address would be a complaint with the ISP.

FWIW: We’ve been running HA pairs with B380s, B20X and MAX HD2s. When configured as per the Peplink instructions the failovers have worked as expected.

Good luck,

Z

3 Likes

I don’t have automatic configuration sync. That seemed like an optional feature, as I wanted the routers to work independently, and ideally back each other up. Though I’d take, one back one up as better than nothing.

When I tested it, it worked fine as far as taking over running the LAN seamlessly, but all the other limitations/issues with the implementation, which are not disclosed, are my problem.

I’d suggest enabling automatic configuration sync. It should address most of your challenges, I expect.

Z

2 Likes

It wouldn’t, it would completely screw up my network.

Hi @Barry_Twycross1250 . I think I may be among those who have had great results with the Peplink HA scheme – and who are confused by the issues you raise. So there are no misunderstandings, would it be possible for you to set forth very concisely (1) where you think Peplink’s actual implementation of VRRP differs from what Peplink has discussed here in the Forum and documented in the FW docs? (2) The objectives you wish to achieve that are not met by Peplink’s implementation of VRRP? (3) what you mean, with specificity, by “all the other limitations/issues with the implementation, which are not disclosed?” (4) Why you are unable to use auto config sync? I don’t mean to be confrontational but I may be missing something here and would like to ensure I understand the issues. Thanks – appreciated! :<)

'“Side note:” I understand your experience is with a B20X. That’s helpful. We have not used B20Xs in HA mode but have used B305s, B310Xs and B380s – all with external [obviously] APs.

  • Rick
2 Likes
  1. Peplink’s discussion of VRRP looks a like the RFC, at least as far as sample configuration 1 goes. You have two live routers on the LAN, and one of them backs up the other. the backup happily goes about it’s own business when not doing backup stuff. The VRRP is totally just something happens on the LAN, and does not affect anything else.

Peplink’s discussions do not mention that synced configuration is the way to go, and the only thing supported. it looks like an entirely optional thing, which I had no interest in. With synced configuration, the only thing the backup can be is a backup, it has no independent existence as a router. (See my configuration the routers are independent.)

The discussion makes no mentioned (the documentation at all makes not mention) that the WiFI Ap will be turned off, and that the WAN MAC address will be altered.

These were all a complete surprise/shock to me, even after I’d read all the documentation I could find.

  1. I want to have two routers, which are totally independent, which have their own WANs, and take care of their own VLANs. The two routers should back each other up, so if one goes down, the other takes over the virtual MAC and becomes the other router, as well as it’s duties as being a router.

That is the Sample configuration 2 from the RFC.

  1. The things I mentioned in 1. Sync being the only way, WAN virtual MAC, WiFI Ap.

  2. Because the two routers are independent of each other. I’m using all the features of the unit which will be overwritten with inappropriate configurations, and turned off, if I sync them.

Side note:

The B20X has a very nice AP, complete with external WiFI antennae, which works much better than the internal AP in the Balance one they’re replacing (with no external antennae). They two units are in different places to give better overall WiFi coverage.

I also have an AP-One-Rugged as an external AP, for various reasons, which extends the WiFi coverage as well.

I think we have arrived at the core issue. You would like an architecture where

and

In other words, you would like to establish a peer-peer relationship, with one stepping in for the other if one of them fails, and where stepping in entails enabling a copy of the configuration of the failing one while still servicing as per its pervious configuration as well.

Peplink’s high availability function does not do that - the members of a Peplink HA pair are in a master/slave relationship, where a failure of the master activates the slave. While the master is active, the slave does (essentially) nothing.

In the documentation (cited earlier in this thread) this is stated in the introduction:

This is then elaborated upon in the step-by-step expkanation further down:

I understand that you read this differently than what was intended, and there may be room for further improvement on what “master/slave” means. (However, I would claim that it is sufficiently clear and offer as evidence that others as well as I expected the behavior as implemented, based on the reading of the documentation.)

The automatic sync is indeed optional - it is simply a mechanism to ensure that the slave is configured identically with the master at all times, ready to be activated in the case of a failure of the master. With automatic synchronization not enabled one has to maintain the configuration of the slave manually. In both cases the system architecture is the same: When the master is active, the slave is not active (except for checking on the status fo the master).

With respect to your system, the method of master/slave synchronization is irrelevant, since the architecture offered by the HA pair is not the one you want.

Cheers,

Z

2 Likes