High Availability Recovery Timeout / Count?


#1

I am working on a customer issue where they have 2 x HD2 in HA with only cellular connectivity.
Intermittently it would seem that for whatever reason cellular connectivity is lost across both mobile network operators which causes a HA failover.

However since the Slave device is also on the same cellular providers the failover fails and I suspect fails back, and then back again and so on. Even when cellular connectivity becomes available again the pair do not restore service.

The only way to recover service is to powercycle both HD2s - so we lose local logs and I can’t troubleshoot.

Is there perhaps a limit to the number of times HA failover (and failback) can occur? Maybe a failover max retry count?

I don’t understand why service is not restored when cellular connectivity becomes available. Could it be that the time it takes to acquire a health checked cellular connection is longer than the failover timeout so the pair will forever flip flop the master slave role?

Any ideas greatly received.


#2

We don’t expect lost connectivity of all cellulars will trigger the HA failover, will fix it in next firmware.
Suggest to disable the HA option “Resume Master Role Upon Recovery” to avoid the flip flop issue temporarily.