Load-balancing 4 DHCP VSAT WANs

I apologize for the long post, but wanted to include pertitent information about this ongoing/developing site. This post seeks advice about the following topics:

  • load-balancing multiple satellite links (VSAT - Surfbeam2 - Viasat/Exede Internet)
  • Ethernet negotiation with SB2 modems
  • MTU size for Viasat/Exede/WildBlue
  • WAN Health-check recommended configuration - specific to Surfbeam 2 satellite modems (or Viasat’s SB2 Plus modem)
  • advice on config to connect MAX BR1 Mini to Balance One (I went that route instead of USB modem for obvious reasons)
  • any experience with cellular boosters inline between MAX BR1 and yagi antennas in Mimo installation

I have a client in a rural area with extremely limited ISP options and would like some advice with load-balancing multiple VSAT WANs from the same satellite ISP (Viasat/Exede). There are 4x WANS:

  • WAN1 - BUSINESS 50 - 3x persistent IP addresses (via DHCP), 35 Mbps down / 4 Mbps up (SB2+ Modem bridge mode enabled)
  • WAN2 thru 4 - FREEDOM 150, 12 Mbps down / 3 Mbps up, DHCP IP addresses (SB2 modems’ single eth port give public ip)

There are NO other ISP options here. No DSL, No WISP, No Cable, No Fiber. There is DSL in a town nearby, but a 1200 ft tall mountain in between that is a national park, so doubt we’ll get permission to put a tower up to relay a wireless signal. Other side is pacific ocean, and there is nothing on the other side for a LONG ways.

Have been using a Balance 20 for the past 4 years with some success, though not great. The main goal in this approach is to provide additional bandwidth/capacity in the summer months when the park is full. Guests are allowed to use ~1 Mbps down / 100 Kbps up. This is intended for social media, web browsing, communications, etc. Using Peplink’s excellent Outbound Policies I’ve been able to achieve effective use of all of the WANs, and to prioritize Mgmt traffic over Guests etc.

The biggest problem I’m having is with the WAN Health Check feature. I have tried every possible option and settled on DNS to 8.8.8.8. Ping was much less reliable (not sure if this is due to icmp packets being lessor priority on Viasat’s network or what). Many times the logs indicate the failure is “No cable detected” or something like that. Yes, I have replaced all cables, tested with forced port negotiation, MTU adjustments, etc. I have placed unmanaged switches in between modems and B20. I have placed managed switches between modems and B20 and set ping target to those, which means the health checks never fail, so I know the b20 works fine.

These health check failures happen many times per day, sometimes 5 or 6, sometimes 20+. I have installed over 300 Viasat dishes so I’m very familiar with them and know what to expect. Yes, they have many limitations, but with a properly pointed and peaked dish (RX SNR values on all = 8.7-9.6 dBm, which in Beam 368 is excellent). Resistence is normal (+/- 0.5 up to 1.5 ohms). Cable is certified solid copper yada yada yada) - THEY ARE PROPERLY INSTALLED WITH GREAT SIGNAL AND CERTIFIED HARDWARE. It is not uncommon for a Surfbeam2 modem to lose sync with the satellite occasionally (3/week, maybe even 1/day 2days in a row, maybe even twice in one day, but this is not common and shouldn’t happen for a properly installed system).

The bottom line is that not every health-check failure is actually a failure, meaning the modem is actually online many times. For example: to utilize more than 1 of the persistent IP’s on the business modem, you can put an unmanaged switch in between. I did this just to test what was going on and low and behold with he B20 connected to the unmanaged switch and my laptop connected via ethernet to switch, my laptop remained connected and was able to ping both IP addresses and domain names (ie. not DNS trouble) the entire time while the B20 experienced multiple health-check failures.

The SB2 modems are somewhat illusive in what they actually do. I think there is some TCP spoofing going on in their to “appear” to reduce the latency, but I don’t entirely understand that (used to be called AccelNet, aquired by Viasat). What I do know is that when powered on they allocate a class c private IP via DHCP in the 192.168.100.x range. Once they scan for, sync, range, enter network with the sat, they then give a public IP to the attached device. The 192.168.100.1 address is still reachable though as the “modem status page” (also used to put in install mode and enter a modem key to identify spot beam). I have tried to setup the B20 using this IP as a ping target. My thinking is that it should be reachable because it’s local, and if it’s not, then the modem probably “really truly” has lost sync and is down for 3-5 mins. I guess my thinking was that maybe the icmp echo packets are being dropped somewhere along the path but after the SB2 modem, which might mean the local IP of modem would still respond, therefore not triggering a foute drop and failover event when other traffic for whatever reason is still going out to internet. This seems to have had a positive effect, and works the best of anything else to date. However, while this may be showing a positive effect in InControl2 and the UniFi Network Dashboard as far as online/outages, wouldn’t this absolutely have a negative effect for those using the network since the route would become active when local modem ip starts responding, even though it won’t truly be online/internet access for ~2 more minutes (this assuming modem lost sync and reboots for whatever reason). I have tried many combinations of various settings for the counts for failure/success, interval, etc. Sometimes I find one that seems great for a while and then some weeks or months later it will be constantly dropping. The current config I’m using is:

  Type: 			DNS Lookup  
  Target: 			8.8.8.8  
  Interval:		        5  
  Failure count:	       10  
  Success count:	       2  

What is recommended “best practices” multi-WAN setup in this case:

  • SB2 modems in bridge mode, public ip to Peplink (existing configuration like this)
  • Managed switch inline between SB2 modem and Peplink with this as its ping target for health-check
  • Router installed on each WAN in-between SB2 modems and Peplink, with each WAN on Peplink having a simulated static IP, using default gateway of ea WAN for health-check ping target
  • Manual MTU size or Auto like viasat recommends and if so, what size?
  • Lock down port speed and duplex? If so, FD1000 or FD100 (The B20 only had 10/100 ports, not sure about One)

Sometimes I think it may have something to do with the number of sessions, but don’t really have any evidence to back this up, it just seems to happen more frequently when there are a lot of users using the network. Viasat will not give any information about max tcp connections, or anything else for that matter. MTU?, “we can’t tell you”… their standard response. Even if true, this is not the only reason, as it happens when few users are on network.

Okay, you may have noticed that I’ve been talking about a B20 (3 WAN ports) and having 4 WANs at the park. This is because they recently added the 4th WAN. This has led to a network redesign… hence why I’m here. I want to know if anyone else has any knowledge or exprience they can share.

Here is what I’m thinking as of now. I’ve replaced the B20 with a new Balance One, with the 5-port WAN license. Did not purchase the SpeedFusion license yet, but if it’s really the way to go then I’m open to ideas. I do have a FusionHub license and instance setup in AWS. Another development is that there is limited 4G LTE coverage available. I say limiited because the siganl seems to vary a lot, from barely usable to not usable, however, if you walk 50’ out into the field from network cabinet lcoation in office, it is always stable. For this reason I have since ordered a new MAX BR1 Mini, with Yagi antennas for Mimo installation and all installation hardware. So now we have 5 WANs…

The most important thing at the park is the store operations, POS, etc. Second is Mgmt wifi, 3rd is Video Surveillance system (29x UniFi Video G3 cameras), and 4th is Guest Wifi. There are 2 physical networks to accomodate this: StoreNet is for Primary #1 function. WifiNet is everything else stated above. Each of these 2 networks has its own gateway (UniFi USG), handling DHCP, VLANS, Firewall, Guest Portal, etc - thee USGs have NAT disabled. The 2 separate networks is what the owner wants, despite the extra cost of having 2 of everything. There have been a couple instances in the past where a network problem affected the store operations that could have been avoided if it were separate like it is now. Ubquiti has been playing fast and loose with their QA/Testing and have released multiple “stable” firmwares that were anything but (I could go on about this but…). Thus, we now have a StoreNet that is separate physical hardware running on a known stable release that is no longer affected by issues with the other network.

Right now I’m back and forth about one particular issue regarding topology. The Store-USG when directly connected to the Business modem WAN1 seems to work well, except that it needs to be able to access the 4G LTE as a failover-only mode. I have had great success with load-balance configs on EdgeRouters at other sites, but not here, not to mention the USG is still back at EdgeOS 1.9.7hotfix3, which has known load-balance bug. Furthermore, this configuration means that only the Store-USG obttains a static ip, which is not a deal breaker but certainly less ideal than having only FQDN with DDNS, especially when working with IPsec S2S VPN tunnels. I want to give StoreNet primary WAN1 (Business VSAT) with Backup WAN5 (4G LTE) (and perhaps even use WAN5 as primary for POS). This can all be achieved with Outbound Policies. WifiNet should use weighted balance across all available Freedom accounts (WAN 2-4). Again, Outbound Policies…no problem.

I suppose the main purpose of this entire post is to share my experiences with this network over the past 5 years and ask the community if they can offer any insights, advice, or additional information about these topics:

  • load-balancing
  • Ethernet negotiation with SB2 modems
  • WAN Health-check with satellite (SB2 modem)
  • Would SpeedFusion be a better solution?
  • advice on config to connect MAX BR1 Mini to Balance One (I went that route instead of USB modem for obvious reasons)
  • any experience with cellular boosters inline between MAX BR1 and yagi antennas in Mimo installation

interesting installation. i have little recent experience with sat isps and none with bonding multiple but a LOT of experience with speedfusion, fusionhub and bonding over speedfusion.
i think that may be your solution. the critical differences are:

  1. speedfusion will bond the multiple paths and appear as one fast connection to the fh
  2. speedfusion does end to end checks on each path. so the instant that a path goes down or latency climbs over the limit you set it stops using that path until it recovers.
    for the specific problem you are trying to solve- a modem going down for several minutes speedfusion should be the answer.
    we use this between wan and cell backup at 500 locations for voip phones. we have it set to jump to cell path on latency over 400ms, packet loss or failure.
    two points: it jumps so quickly you do not even drop calls in progress and it changes paths even when both ends are clean but problem in the middle is causing trouble.
    before doing this wan may be perfect when something like peering problem i middle of country pepwave did not go to cell because health check still working on wan. now it does
    feel free to contact me directly at jscully@pizzacloud.net
1 Like

also, on the cellular backup: keep in mind that there are legacy, grandfathered unthrottled/unlimited plans available. i have a few in use. $140 per month and i use around 600g per month.

1 Like

jmpfas Thank you for the comments. Yes, I am aware of those plans and have priced them out. A bit of a process but certainly worth it if a reliable signal can be achieved. We’ll see tomorrow how the installation goes!