SpeedFusion Cloud with StarLink and Verizon

dnavany · March 15, 2022, 6:40pm

Hi
I have a use case with a very picky customer with a need for high definition business video conference in the middle of nowhere. We have StarLink and a Verizon LTE (not LTE-A) connection using a SpeedFusion cloud setup with FEC to bond the 2 connections. The 2 connections have very similar latency. However, most of the traffic is going over Verizon with 20-30GB usage a day which is very expensive versus unlimited Starlink.
Is there a way to balance the allocation between StarLink and Verizon so that most of the traffic is going to Starlink with Verizon filling in the gaps on a SpeedFusion Cloud connection? I can clearly allocate with an outbound rule policy with a weighted balance but loose the benefit of SpeedFusion cloud.
David

C_Metz · March 15, 2022, 8:36pm

I haven’t personally tested the effectiveness of this, but here’s what I would set and then test with some simulated failures to make sure it doesn’t drop the call.

In your SpeedFusion profile, set Verizon to priority 2 and Starlink Priority 1. That will put Verizon into standby for the tunnel and use Starlink as the primary… I just haven’t tested failover in that scenario. I don’t know how fast it will pick up on failure. Hopefully someone else here has tested it and can speak to it.

dnavany · March 15, 2022, 8:56pm

@C_Metz Thanks for your thoughts. I’ve tried that and even on Faster or extreme it sorts of “browns out” and gets congested but is still passing the link failure detection packets when Starlink hands over a satellite and Teams then drops for about 15-20 seconds.

mystery · March 15, 2022, 9:32pm

Do you want to use Smoothing instead?

There are different times when you want to use Smoothing versus FEC versus both.

If the packets are being duplicated (Smoothing, etc), then the bandwidth will be used on all links.

You can try to throttle the bandwidth to the device.

Is the Starlink not stable enough to use primarily, and then failover to VZW?

C_Metz · March 15, 2022, 10:08pm

@dnavany I think I found what you might be looking for. You have to go to the support.cgi and enable PepVPN Traffic Distribution modes. Then you can select Weighted Round Robin. Take a look and let me know if it solves your issue.

dnavany · March 15, 2022, 10:18pm

@mystery - surely if I use smoothing then I’m going to duplicate all traffic across VzW and he’s going to have an even larger bill with all of the traffic duplicated over VzW and StarLink? Am I missing something in my understanding of smoothing?

Starlink isn’t stable enough yet to handle as sole primary, but we just got allocated a v2 StarLink and I’ve heard that they perform better.

The challenge seems to be the slowdown / congestion on StarLink that doesn’t trigger the hot fail behavior that @C_Metz has suggested above, but its enough to tell Teams that there is a problem. FYI, the problem doesn’t exist on zoom or GoTo meeting … they seem more tolerant.

dnavany · March 15, 2022, 10:21pm

@C_Metz I’m going to have to read up on Weighted Round Robin theory. Feels like a decade since I used that!!! I’ll try it tomorrow and report back. Thank you!

dnavany · March 16, 2022, 7:07pm

@mystery Thank you for your idea on WAN smoothing. I set up a new SFC connection only on Starlink but with both streams of WAN smoothing over Starlink. I then had a fall through rule in Outbound policies to an equal priority mix of Starlink and VzW. My link failure detection was set to extreme (under 1 Second) but it didn’t fall through to the next rule and bring in VzW.

It managed to handle a few “glitch” today and recovered with about 2 seconds interruption to the Teams calls which is hugely better, but still not acceptable to my client. I’ll keep experimenting to see if I can find a better way. All ideas still welcome.

@C_Metz I also tried Weighted Round Robin. What I suspect happened is that VzW was unable to increase when Starlink got congested. Does that make sense?

C_Metz · March 16, 2022, 8:11pm

EDITED: I withdraw my previous statement because in the chart above, Starlink just disappears and there is no latency increase before it does, so my idea of what works on a cell connection won’t work in that instance

C_Metz · March 16, 2022, 9:36pm

One more piece of information to help you in this fight… Teams has a very specific behavior when setting up a call that you might be able to exploit… Lets say the user you are calling is on 192.168.1.25 on his local network, NATed somewhere on the internet to lets say 2.2.2.2, now remember there are 2 people involved in every call and 2 directions, so this happens in each direction…

Your PC Attempts a UDP connection to

192.168.1.25 Which fails… This connection works when 2 people are sitting in the office together.
2.2.2.2 Depending on the other users Firewalls and NAT type, has a decent chance of working.
Microsoft TURN server
It does all these on UDP, and if they all fail it goes to TCP and SSL depending on configuration, but that’s not important for my thought here.

Now here’s the fun part… When it succeeds with one of the 3 above… it will have 2 active sessions in your session table… One of the 3 options above as primary… AND a connection to the MS TURN server for backup. If you end up using a TURN server as your primary connection for the call, the software opens a secondary connection to a different TURN server. There are always 2 UDP sessions established for the same call. So There may be something in outbound policy you can do by looking at the sessions established and try to get them to take different paths… Again not a fully baked thought, just some more info to help you in your quest for reliability.

Oh also, I noted that a user who left his wireless and wired NICs both connected had everything above done twice by the software… meaning 2 primary paths and 2 backup paths… the software on the laptop seems to think… hey if I have 2 NICs, I better use them both for reliability.

mystery · March 16, 2022, 10:46pm

Very interesting

dnavany · March 16, 2022, 11:54pm

@C_Metz Thank you so much for your insights. I’ll spend some time investigatIng.
My client came up with a brute force suggestion … add a second Starlink. I also thought about finding a way to move the VzW connection so it has higher latency. I worry that both Starlink will hand over satellites at exactly the same time but do wonder if I can degrade the VzW by moving it to another Peplink with a WiFi WAN to degrade the cell connection.

C_Metz · March 17, 2022, 12:04am

WiFi WAN on my max transit has 3ms latency on 5Ghz and 8ms on 2.4, so it adds some, but not a lot.

dnavany · April 21, 2022, 2:26pm

@C_Metz
After a lot of experimentation, I have found a very workable solution. Key items:

Bond the Starlink and Cellular connections at priority one as shown with a weighted round robin as you suggested above.

Set bandwidth on Starlink and Cellular on a 10:1 ratio. So 250/25 for Starlink and 25/2.5 on cellular.

I’m assuming that because both connections are passing data, as Starlink starts to glitch then FEC covers while the load goes to cell and then comes back to Starlink. No complaints about connectivity for a over a week, and the second starlink is now sitting doing nothing. There’s 150GB of cell bandwidth per month and I’m assuming that I can tune the bandwidth ratios between Starlink and cellular if I need to further optimise.

Thank you!

erickufrin · April 21, 2022, 6:28pm

Setting the traffic distribution policy to “bonding” will make Speedfusion consume more Starlink bandwidth.

Weighted policies dont work well when one wan is dramatically higher throughput

Spencer_Honeyman · August 31, 2022, 7:11pm

@dnavany

Im in a similar boat with Starlink and an LTE connection. I am curious how much data gets used in a month on this setting of FEC with 10:1 ratio? I currently only have 15gb/month on my data plan but may need to increase this. Use case is for Zoom for home office needing very stable connection for coaching & therapist work. Any thoughts appreciate. Cheers!

dnavany · August 31, 2022, 7:19pm

@Spencer_Honeyman
Our set up was primarily using Teams which was typically consuming about 2-3GB an hour for HD video conferencing. For a typical work month the setup was using about 60GB of data on the cellular connection but I haven’t tried to track the split between Starlink and LTE as we have very limited access to the location and its data. Sorry that I can’t be more help.

Spencer_Honeyman · August 31, 2022, 11:13pm

That is helpful actually. Thanks for that info and for laying out your solution above. May need to increase data plan but will experiment to see if it works for me as well. cheers