Not available - link failure, no data received error on WAN port with Starlink connection

So I have a Max BR1 Pro 5G with a VZW Cellular link and a Starlink connection on the WAN port.
I have two PepVPN connections, One of them set to SpeedFusion Connect (Auto) and the other is setup for a tunnel back to a SpeedFusion Hub virtual appliance at an office location.

The SFC tunnel seems to be using both my VZW Cellular and the Starlink WAN connection but the SpeedFusion Hub VPN is only using the VZW link and showing an error message reading “Not available - link failure, no data received” I can pass traffic over this tunnel but I need it to be able to do this with either the cellular WAN connection and or the Starlink connection.

I have bypassed the Starlink router and the BR1 WAN interface has a 100.78.84.XX address on it which I think is a CGNAT IP from Starlink. On the SpeedFusion Hub side, I am NATing from the a public static to the appliances NAT IP on UDP 4500 32015 and TCP 4500 and 32015.

Am I missing something? Why won’t the VPN to the Fusion Hub use both of my WAN links?

I have also seen this in my tunnel that includes the Starlink WAN. My best guess is that it has to do with the public IP in front of the CGNAT changing or something similar. I don’t see any associated dhcp renewals/ip changes at my WAN link level.

I want to say I set my tunnel up to use tcp and it got a bit better - there was a bit more resiliency, but it wasn’t fool-proof. I would have to disconnect/reconnect the Starlink wan to get it to join the tunnel again.

My guess is the vpn layer requires symmetric routing and Starlink doesn’t always provide it.

Peplink could/should implement some kind of “retry on failure” option in my humble opinion. Simply throwing an error seems a bit strange for something that is “unbreakable”. It should try to self-heal after some amount of time.

I have found very subtle issues with dual speedfusion tunnels on CGNAT providers.

#1 use a port other than 4500… since that is a default IPsec UDP tunneling port that sometimes gets special attention at a NAT/CGNAT layer. I start at 4501. (Adjust both sides of the tunnel)

#2 Use different ports for each Speedfusion concentrator. (If #1 isn’t enough)

Without doing packet capture at each location you can’t figure out which part of the network is dropping the packet.

I will try changing the ports for this specific tunnel to something other then the defaults.

Also, I know that a traditional IPsec tunnel uses IP protocols 50 and 51 yet Peplink does not ask for this to be forwarded in a NAT. They only ask for the UDP 4500 and TCP 32015.

I went ahead and put the FusionHub virtual appliance on the outside of my Firewall and gave it a public, static IP which fixed the problem but of course this isn’t an ideal solution.

I read somewhere on a Peplink site that in order for a tunnel to build, at least one of the two sides must be a public, static. The VZW side is but the Starlink isn’t.

If a provider won’t NAT the bare IPsec protocols 50&51 (many FW & CGNAT won’t) , the clients will usually switch to NAT-T encapsulation which is UDP port 4500 and NAT gateways have specific handling code for that port, and they assume it should be provided this special handling. The pepvpn/speedfusion UDP packets are not NAT-T, but are their own protocol that can’t be adjusted. When you move to port 4501 then the CGNAT engine handles it as a run of the mill UDP stream.

What is required is that one end of the tunnel must have a non CGNAT Internet routable endpoint, it can be via NAT, but it needs to get all of the port 450? traffic. it can be dynamic with DNS resolution. it doesn’t need to be static. We will call that the HUB… in your system both FusionHub and SFC are both hubs. Given those hub endpoints all of the WANs of the remotes can be CGNAT. As long as the firewall will allow the traffic it doesn’t need to be exposed.

What you can’t do is a completely CGNAT → CGNAT.

The problem I have seen with UDP 4500 isn’t logical, but getting support from Starlink and VZW to do packet captures at their CGNAT engine just isn’t happening. The only thing I can think about is that the CGNAT system has thousands of port 4500 sessions, and eventually gets confused when two almost identical SF tunnels are going via the CGNAT system. I have had VZW work in one state just fine… and then when remote in another state, the one SF tunnel would be down… With no config changes on my end… but probably a different CGNAT hardware, or just different traffic levels, different day, etc. Once I moved the tunnels up to 4501 or 4503 it connected everywhere.