Serious Streaming Issues (100% cpu)

What health check interval are you using with the PepVPN, when set to “extreme” I have been able to keep streams to YT Live and many other platforms going just fine if I pull a WAN cable or eject a SIM when the Ethernet WAN is set to 1st priority with the two LTE modems in 2nd priority - so not SF Bonding but using the SF Hot Failover.

I use Dummynet pipes to simulate various WAN links - a quick and dirty way to use this is to install pfSense on a little x86 box (or even in a VM) add some VLAN interfaces to it and break them out via a managed switch, you can specify random packet loss / latency values for each interface - I have this setup in my lab where I can simulate a dozen different WAN links with high latency / loss etc.

Currently the PepVPN Settings for Link Failure Detection Time are set to Recommended. I can test it with extrem.
The thing is, another point of using the Peplink solution is the Session stable connection. When setting it to extrem and putting the LTEs in 2nd Priority, that would kick the VPN Tunnel and a switch to the FusionCloud as fallback would occur right? In that case i would loose session depending connections when we do for example a GoToWebinar, so primary goal is to keep the VPN Tunnel alive as long as possible.
When is the Peplink router deciding that the WAN link is down? Pulling the cable is the extrem version yes, but what if the througput is breaking down to a few hundred kilobits/s but still alive?
Is there some method to set a “minimum” bandwidth that a WAN link must provide so that it is included in the SpeedFusion tunnel?

No, the sessions should still be persistent and the standby links are brought into active use.

Example of a TST we have being used right now where the venue has given us a 100/100 WAN, and we are getting 80+Mbps up/down on both the LTE WANs - bonding these together actualy hurt performance so they are all just in failover order.

Testing this in rehearsals pulling the WAN cable the stream (1 RTMP feed to YT Live and 1 RTMP feed to a private platform) was unaffected, failover was <1second and the session is persistent, we have 3 VPN hubs configured on this TST but they are only used when the primary hub is totally unavaibale.

Yes it is the extreme failure scenario, I tend to be watching the bandwidth going in/out via each link from the PepVPN status when something is live and can manually disable the path if it looks to have gone bad.

We tweak the healthchecks on the WANs to be more aggressive, and actually ping two targets deep within our own network so we have a fair idea of reacahability to the internet and our own infrastructure so we can trigger a WAN health failure on low level packet loss. You do have to find values that will work though as too aggressive and the links may thrash between states of healthy / unhealthy.

As far as I know there is no feature in Peplink at the moment to say “if observed throughput is lower than X consider the wan useless” as they do not do any kind of active measurement of the links involved in the VPN. We actaully returned to using Peplink recently from a different product which had this feature and honestly it often was trigger happy about declaring a link unusable, and all it really did was generate a huge amount of excess usage performing active measurements.

Perhaps in the future though Peplink could introduce some sort of hysterisis curve that wuold allow this to be done passively, but such methods also typically require a good knowledge of the historical performance of a given link - not ideal when you are using it in one location for a few hours at most.

Thank you for your long answer, i really appreciate it!

In your current scenario, your WAN is gone trigger depends on your configured health check method right? Hmm… i think this will need some time to figure out if it not directly depends on a bandwidth limit.

This is a more realistic scenario i am testing right now. The WAN connection (local Internet in the office) is occupied in this moment totally by the synology NAS doing some backups to another NAS. So, the latency is rising alot, and the Router can handle this better than having a low latency but limited throughput it seems. Also i disabled encryption, looks like this made up some % of free cpu space. But currently again testing with wan smoothing set to normal and fec set to low.

we will continue testing… so, even when there is no active WAN in a PepVPN tunnel (in the moment of the failover) the link and the sessions are still persistent? have to test this aswell. and yes of course, we will use two different lte providers out in the field.

thanks!

Best regards, Martin

p.s.: do you have a go thru method of testing all the wan connections? did you made yourself a list of outbound policies for a config laptop to send the traffic to each wan alone, also doing some captive portal logins on locations wifi that needs it?

In that exact example yes, and the target for the healthcheck is two IPs within our core network (if both are unable to answer safe to say 2 different datacentres have gone dark for me!).

This is possibly a result of how I believe Peplink does their passive measurements of the WANs but someone more knoweldgeable can probably explain that better than myself.

Out of interest did you try the dynamic weighted bonding option (DWB) - you can enable it for the tunnels by visiting the support.cgi page, I have had mixed results with it but it has proven effective when links are very variable or with less closely matched latency / loss / capacity, there are some threads on here about it, so something else to look into perhaps :slight_smile:

Ofcourse your milage may vary to mine, but this is my expereince when using the extreme setting for the PepVPN healthcheck the standby links are brought into use so quickly that no sessions drop or expire.

I tend to use the WAN analysis tester to a spare FusionHub we host in the same location as the production units as for me this is proving the end to end capacity of each link between the remote network and mine outside of the PepVPN, after that using the PepVPN bandwidth test itself to verify VPN performance on site and also good old iPerf again to a server in our network and public speedtest servers - we even host a Speedtest.net server in the same location as one of our FusionHubs so I can benchmark the VPN with their tool but I’m not testing against some random (and often severly underprovisioned and overloaded) server on the public internet.

Hi, Just following your thread.

We ae streaming on max700’s and get locked out the user interface when the router starts being used. Fortunately everything else seems to work ok, but we cant log in until the actual traffic requirements slow down. Thing is were only talking 20-30Mbps to lock the routers out.

You said here that they were HW2 versions. The MAX 700 HW2 has a VPN limit of 25MBps because of CPU / hardware restrictions - so that would make sense.

1 Like

Yes I just saw the reply on the other thread so It may not be the same issue causing a similar result.

1 Like

I do wish the CPUs would handle more bandwidth while managing tunnels. This is my major problem with the Max Transit. I have two LTE connections that can provide 100Mbps+ each, but when bonded, they give me a max of 65Mbps due to CPU restrictions. Unfortunate bottleneck.

1 Like

To be fair to Peplink this is one way they differentiate between a Max Transit, an MBX-4 and the SDX… higher performacne models are available with more powerful guts (and the obvious increase in price).

Differentiating performance / capabilities of a proudct in this way is quite typical of all network equipment vendors.

I understand your comment, however the Max Transit Duo is advertised to support 150 users and 400Mbps. This is at least a small/medium office expectation.

Fair enough, but when you look at the advanced features that are clearly intended to support a larger client base with enterprise-class requirements…

…all of a sudden you need to turn on Speedfusion and have the Max Transit Duo manage tunnels. Then the router drops to 65Mbps, best case.

This seems to suggest a different expectation and purpose for this particular router. Which is it, really?

2 Likes

I can’t find this “dynamic weighted bonding option” on the transit duo. Can someone give me a hint where i can find it? Thx!

Log into your router either directly or via InControl, in your browsers address bar change “index.cgi” to “support.cgi” it is hidden there.

2

well, if i click on that it brings me right back to the normal gui PepVPN overview tab. :thinking:

After you enable it in the /support.cgi screen, you have to go to the router UI, Advanced > Speedfusion tab and you’ll see an additional line that lets you choose the Traffic Distribution Mode.

ok thx, i see it now. this whole thing with the hidden features is pretty strange with pepwave devices!

why can’t we have a simple toggle button like on cisco and co to switch between simple and advanced mode? simple is out of the box, advance just shows all options… :roll_eyes:

1 Like

I suspect that the Traffic Distribution Mode capability is somewhat experimental. Actually, Peplink says that Dynamic Weighted Bonding MAY improve things for LTE connections, so it IS somewhat experimental.
It kind of makes sense to bury it a little deeper in the interface until such mode changes are more proven. My two cents.

1 Like

is FEC even worth having on when Wan Smoothing is ON?

I would have thought it’s just causing extra CPU load given you are already sending the packet down every WAN and then using whichever gets there first.

there are some good past threads / posts explaining FEC and Smoothing, when to use one or the other, or both simultaneously.

we have to test it further, but when doing live-streams like streaming to YT/FB with LTE WANs it is advertised to get better results with less lost packets, etc.