Another weird behavior: The HealthCheck is set as follows:
- 10 second timeout
- 10 second interval
- 10 retries
- 1 recovery retry
By my calculations this means that there must be 10 consecutive DNS failures (which means over 100 seconds ) before the WAN is considered dead.
However, every single time this happens, the very next check succeeds, and the WAN comes back online.
Here are the last three examples:
Thu Mar 16 17:31:00 PDT 2017
WAN 1: Disconnected (Link down)
WAN 2: Connected (IP: x.x.x.x)
Thu Mar 16 17:31:04 PDT 2017
WAN 1: Connected (IP: x.x.x.x)
WAN 2: Connected (IP: x.x.x.x)
Fri Mar 17 12:59:44 PDT 2017
WAN 1: Disconnected (Link down)
WAN 2: Connected (IP: x.x.x.x)
Fri Mar 17 12:59:48 PDT 2017
WAN 1: Connected (IP: x.x.x.x)
WAN 2: Connected (IP: x.x.x.x)
Sat Mar 18 01:20:50 PDT 2017
WAN 1: Disconnected (Link down)
WAN 2: Connected (IP: x.x.x.x)
Sat Mar 18 01:20:54 PDT 2017
WAN 1: Connected (IP: x.x.x.x)
WAN 2: Connected (IP: x.x.x.x)
Notice that each time, the reconnection happens exactly 4 seconds later.
I find this highly suspicious: what are the odds that each time my WAN is dead, it dies for exactly 100 seconds and is back online at exactly 104 seconds.
This feels more like a bug to me - whether it’s in the Peplink, the modem, or the DNS server, I can’t say.
How can I debug this?