More details around WAN Links?

I know we have WAN Quality, connection up/down, and a few other things, but one thing we’re missing that would have really helped with a recent troubleshooting session is packet loss. I think the Peplink may already be gathering this data (and more) as part of health checks. Would it be possible to expose this somewhere?

Here’s an example of what Meraki does. In this MX250 I have two WAN links (WAN1/WAN2) and it’s showing live usage up top (by the second, which Peplink already does), latency over the week (it does 24h, 7d, and 30d) which peplink also does, but the kicker here is packet loss:

Recently, I was troubleshooting a link that had significant packet loss and I was at a loss: I couldn’t really use ping to do it because Ping tests in Peplink are limited to 5 pings (no continuous), and while doing a WAN analysis will kind of work, it doesn’t show historical data and it’s a multi-step process to get it set up.

Here’s the latency graph from a Meraki MX75 at a client location that we used to tell Spectrum (local ISP) that their link was trash :slight_smile:

One day:

One Month:

With a Peplink device using the same link, I can only see this (retransmits):

But in this case, the latency graph on the WAN Quality Report showed nearly everything fine despite 10-20% packet loss for a month.

Happy to demonstrate this, but I think this is a fairly reasonable ask :slight_smile:

cc @Giedrius

3 Likes

Hi Christopher,

thank you for sharing the idea it does look interesting :wink: Let me bring this to the team for discussion.

2 Likes

@ChristopherSpitler I think we can combine with the Ping health check per WAN to include the packet loss presentation on the WAN Quality report. This should satisfy both health check and reporting purposes… Thoughts!?

7 Likes

Seems like that would be perfect!

1 Like

@ChristopherSpitler Good idea!
@Giedrius / @Eddy_Yeung This would be awesome to implement, I’d also like to get alerts if latency or packet loss is over a set threshold.

3 Likes

I can only strongly support this idea, especially if we can define some kind of “cut-off quality” (combination of latency and packet loss) on a WAN connection in order to “deactivate” it in both Speedfusion tunnels as some outbound policy algorithms as described here: Cut-off latency in Outbound Policy priorities

Note however that the measurement of packet loss is defined by your interpretation: packet loss of the ping tests and the TCP retransmissions can be monitored, but plain UDP traffic will be dropped along the way without notice by the router.

1 Like

In other implementations of this, transmission of user-plane traffic is not taken into account. This is router-generated traffic that is specifically monitored and measured. I don’t expect to be able to use it to give me information on what my UDP packets are doing, but I do expect it to provide general packet loss from it’s own ping/MTR testing happening in the router.

Thanks for the support :slight_smile:

2 Likes

+1 to this too.

The current “WAN Quality” report in Ic2 is already pretty good as a starting point - perhaps a couple of enhancements could be made to that feature:

  1. Let us specify the actual target(s) that generate those quality reports, per WAN and perhaps “global” ones that are used by all WANs on a device, this would allow us to monitor things we care about directly, and also consistently target something across multiple WANs / providers for monitoring.

  2. Add more detail and metrics as mentioned by Christopher - things like loss, and jitter might be good to graph in here aside from just the latency.

  3. Expose the counters from these probes in SNMP too, not just via Ic2 so we can easily take the data into our own monitoring tools - we do this for the IPSLA probes on numerous Cisco routers and the SDWAN health checks in Fortigate firewalls and they provide excellent data that we can store and trend / alert on. Ic2 is great, but we keep years worth of metrics in our own systems with very high resolution / granularity which have proved very handy when showing quality changes or degradation over time to providers.

2 Likes