I am testing speedfusion in our environment with a Balance 710 as the hub and two 380s on the spokes, all running firmware 7.1.2 build 2574. The hub is our corporate HQ. The spokes are branches with 4 IP subnets each. Each Peplink has a LIVE Internet connection on WAN2, and an Ethernet subnet on WAN1 which connects all three peplinks to simulate our private Metro-Ethernet WAN.
In order to test fault-tolerance, at each site the peplink is put in Drop-in mode with LAN Bypass, meaning LAN and WAN1 become just a physical conduit if the peplink drops dead. So far things have largely worked. There is one nagging issue however.
I noticed sporadic icmp packets drops ping from a client PC at each remote site. I saw momentary packet loss rate up to 30-40 pkt/s on WAN1 link Speedfusion status on the 380s. I further noticed the pepvpn on WAN1 would momentarily tear down and rebuild without leaving any messages in Eventlog. It would briefly display “Link failure, no data received” as it tore down vpn on WAN1. I verified all cable connections and swapped out switches to no avail.
At the same time, VPNs over WAN2 were rock solid and clients never missed a ping to Google’s DNS servers (Internet traffic is not routed through PepVPN unless WAN2 health check fails).
Based on my previous testing experience in my initial Peplink POC. I knew this had to be something to do with the way Peplink makes routing/path selection decisions. When Peplink doesn’t know where to forward the packet or put the packets on the wrong path, it will create issues like this. But it continued to happen even after I disconnected all WAN2 connections.
So I started tweaking Health Check settings under each WAN interface and later Link Failure Detection Time under Pepvpn Settings within SpeedFusion setup. When I set Failure Detection Time to Recommended (approx. 15 secs), I finally got a stable network. Speedfusion VPN over WAN1 doesn’t tear down any more, and my average packet loss drops to around 1% on WAN1 links, measured by pinging. Throughput test on WAN1 with PepVPN without encryption can now get to up 80%-90% of the bandwidth I defined in WAN1.
If this were a real-world test, I could live with the results. But the problem is that the WAN1 interfaces are connected back to back on an enterprise-quality access switch. There shouldn’t be any packet loss. Even the WAN2 connections, which are real-world Internet connections across various ISPs, do not have any packet drops the entire time.
Without a Cisco-like CLI I have no way of debugging. I have tried the SSH CLI. I have clicked on all the ? icon and exhausted all the hidden fields that I can possibly try. And the user manual is practically useless in advanced troubleshooting.
Any insight into this will be tremendously appreciated, as I am way past my planned depoyment date.