Health Check Fails for Host 1 but Passes for Host 2 Event Logging

adeebag · February 7, 2017, 12:39am

I would also find this level of reporting in the event log useful. In order to avoid it bloating the logs, there could be a “log all Health Check failures and immediate successes” setting to log all failures and the first success after it for Health Checks (instead of only reporting when the full range of checks for a given WAN result in a failure).

this is especially relevant if you suspect that your WAN connection is not 100% stable.

In my case, each of my 3 WAN connections has the Health Check set to:

DNS Lookup
Host 1
Host 2
Include public DNS servers
Timeout: 2 second(s)
Health Check Interval: 5 second(s)
Health Check Retries: 3
Recovery Retries: 3

So for the Peplink to fully test the connection and identify it as a “WAN failed DNS test” in the event logs it would take (assuming full failure then full recovery rather than intermittent results within cycle):

5 secs for interval from last successful check
2 secs for timeout x 3 retries x 3 hosts (assuming it is only testing one public DNS host) = 18 secs to report as down
3 retries x 2 secs for timeout x 1 host (presumably) = 6 secs (minimum) to report as up again
total time the WAN is potentially offline for each occurrence = 29 sec
total time the event log shows that the connection was down = 6 sec (minimum)

29 seconds is long enough to disrupt many user sessions – especially VOIP, streaming, etc. And we are experiencing this frequently.

While we could reduce this time by setting the timeout to 1 sec, reducing the number of hosts, and reducing the recovery retries to 1, I am not sure if that would actually cause too many WAN state changes so that users end up suffering even more.

This also does not include scenarios where the Health Check failed for 3 retries on 2 hosts but where the final retry on the 3rd host succeeded – i.e. the connection was actually down but the event log does not show it.

Having the fuller logging in place would help us better diagnose unstable connections or even tweak our Health Check settings for the optimal balance.

Thanks.

[Peplink model: Balance One Core, Firmware: 7.0.0 build 2742]