Balance 30 freezing under load? Would upgrading to a Balance One help?


#1

Configuration:
B30 - WANs via cable and bonded DSL - US (metropolitan) location
B30LTE - WANs via satellite, cellular (Verizon LTE) - US (mountain) location
B20 - WAN via DSL - Europe (remote) location
pepVPN between the B30 and each other two others
Issue:
The B30 (the hub of the PepVPN star) will occasionally freeze - no web admin access, no routing, still responding to pings.
It seems to be correlated with load - when a WAN fluctuates and/or when the PepVPN connections fluctuate, or when I move from one VPN connection to two.
There is a ticket on the problem, but if the issue is one of non-graceful degradation under load I was wondering if moving to a Balance One would make a difference?
Or in other words - is the Balance One significantly more capable than a Balance 30 (HW4) in terms of how it can handle VPN and general traffic load, and yo-yo WAN connections?
I would love to see such differences supported by information about the specific hardware differences (CPU etc.).
Please advise.


#2

Hey Zegor,

In terms of pepvpn speeds and peers of a balance one wont make alot of difference.
if you want more speed while using pepvpn or more peers you should maybe take a look at the 305/380.

you can take a look at the spec’s on the link below if anymore questions pop up feel free to ask.


#3

I am fine with the current speed - it is limited by my WAN connections. What I am less happy about is that the B30 seems to become unresponsive under stress. If that is simply a matter of processing power the question becomes - is the Balance One processor more capable than the B30 - or do they run the same system as far as load handling and capacity is concerned?


#4

Zegor - if the B30 responds to pings, it hasn’t frozen. Not sure what your problem is, but the device is not locked. I had a similar problem which turned out to be a fiber-to-ethernet adapter that connected to my router. I would try to isolate your problem by removing pieces of the network. If you think its related to load, you could create data transfers to stress test. The device can’t be asked for more throughput than the total WAN speed available, and the links you describe aren’t that fast. In my experience, DSL can be unpredictable, I’d start by testing without the DSL link on the B30.


#5

Thanks.
If there is equipment failures on the WAN side - that should be the cases where the B30 is supposed to shine: Continue functioning on the other WAN lines, right?
Instead: No web access, no routing from the LAN to any of the WANs.


#6

Zegor, you may be correct, I’m just suggesting things to experiment with to find the real problem. In my opinion if the B30 responds to a ping, it isn’t frozen. Are you testing a PING on the LAN side or the WAN side? The equipment failure example I gave actually happened to me, but you are correct it had no effect on the other ports including the LAN.


#7

I opened a ticket on this, and the diagnosis seems to be simple: One of the WAN connections is faster than 150 Mbps (the throughput rating of the B30) and when saturated the B30 does not degrade gracefully, nor does it throttle the incoming traffic to comply with its capability limits. It simply dies (except for responding to pings from the LAN).

I also learned that the capacity settings for each network (in the network tab) does not limit the download direction.

Establishing QoS rules to limit the overall traffic seems to have corrected the issue (the network has been stable, even under full-throttle stress).

I am not quite ready to close the ticket, but I am optimistic.

Thanks to Ron for being a very responsive support person.

A feature complaint to Peplink: A router really needs to degrade gracefully when the connecting (set of) WAN(s) has a higher bandwidth than the router does. Essentially, a QoS-like default rule that limits the bandwidth to the capacity of the router, or (if that is too aggressive), a means to throttle back when imminent failure is likely. The QoS rules of the B30 are not suited for this kind of global capacity management (though they have other good uses).


#9

Establishing QoS rules to limit the overall traffic seems to have corrected the issue (the network has been stable, even under full-throttle stress).

This did not correct the problem.

Two solutions suggests themselves: (1) simply downgrade the connection from the cable modem to the B30 to 100Mbps (there is a setting for that), or (2) upgrade the router to a Balance One Core.

I don’t have time (nor do my coworkers have the patience) for more experiments with network downtime, and reducing the speed of the pipe to 100 Mbps seems a less than optimal solution, anyway.

A Balance One has been installed as a replacement, and seems to be working well - hitting some 205 WAN Mbps on a stress test…