Configuring 1+1 Backup by High Availability (HA)


#42

I came across my first HA Master failure recently, and we were called to replace the unit.
Up until now, we only showcased HA availability switchover, by unplugging the Master power cable (as mentioned earlier in this thread)
I tried searching the fora and the knowledgebase about the required steps but couldn’t find all the steps necessary in one place.(OK i didn’t search thoroughly, i admit)

So this is what i did and successfully replaced the Master device and restored 1+1 Backup HA group - in case anyone else needs this information in the future
(If someone from peplink could point out any mistakes please let me know so I can correct this)

  1. The slave device has assumed Master role and all Speedfusion VPNs were up and running. The web interface of both the Virtual ip and the slave’s IP only showed status of the device, we couldn’t change any option

Solution: As pointed out earlier in this topic, this was because of the active Sync from Master checkbox. As soon as I unchecked this, we had full access to the GUI settings

  1. Downloaded the configuration from the Slave device

  2. Connected to the replacement unit, firmware downgraded it to match the FW version of the original devices (ain’t nobody got time for keeping critical devices up to date firmware-wise) and uploaded the configuration from the slave device (from System->Configuration->Upload Configuration from HA Pair)

  3. After uploading the configuration we had to change the LAN IP of the new device and the hostname.

  4. Check in Network->Misc Settings->High Availability that all settings matched between the two devices
    In the new device we specifically chose Master role and Resume Master Role Upon Recovery

  5. My biggest concern here was that we would need to reconfigure all Speedfusion Tunnels because they used the ID of the failed device (which i though was tied to the device’s hostname)

Solution: Peplink was smarter than this and the PepVPN Local ID may default to the device’s hostname but isn’t tied to it. So after uploading the configuration, the new device had the same Local ID as the old, but i could change the hostname without affecting VPN tunnels

  1. Connect the LAN and WAN interfaces to the cables that were initially plugged to the failed device

  2. As soon as the device turned on, it resumed Master Role within seconds. No issues were reported (or noticed) by the users of the speedfusion connections (boy, were they critical)

  3. Logged in to the slave’s IP and re-checked the sync config from master (with the serial number of the new device of course)

  4. Profit

Hope this info would help someone in the future
Again if someone sees any unnecessary or simply wrong steps, please let me know so I don’t mislead other people


#44

What is the correct procedure for updating firmware on a master/slave pair with minimal downtime?


#45

@JPWGC

Please check the knowledge base below: