How to know if your router is overloaded?

We initially started using our Balance One core with 50/60 “clients” in mind. After 3/4 years of usage, our network grew up a bit.

We have now on the physical network:

  • 3 servers
  • 1 PBX
  • 8 switches
  • 18 access points
  • 2 wireless bridges
  • 50 smartphones
  • 45 computers
  • 8 printers
  • 34 students guest devices

To the scenario above, we will add 80 Chromebooks and we will remove from the network students devices. A final total of 210 devices

Actually a lot of these devices are only using local network with no traffic to WAN (think about all the switches, access points, printers and servers), however as for the VLANs there might be quite internal traffic routed by the Balance One.

Let’s say that the potential number of clients accessing the Internet could be up to 180 devices, although not all devices will be accessing the Internet at the same time.

Basically I have never experienced a high CPU usage on the Balance One ( it always runs between 10%/20%), but I want to ensure that both local routing and WAN routing quality is not affected by the number of “clients”.

How can I ensure this?
What symptoms should be taken in account to proof that router is suffering? Jitter, packet loss, high CPU? Any upgrade required? Suggestions?

Maybe @MartinLangmaid, @Don_Ferrario and @sitloongs can throw in a few cents on this?

Looking at the Balance One Core product details I would be prepared for an upgrade.
Monitor Router throughput and CPU usage to see if you are close to the recommended maximum.
Likely your users will let you know when they start experiencing issues (slow internet, choppy voice calls) if you reach a bottle neck.

As you mentioned, it depends how hard you make the Balance Core work, but the amount of users and devices in your network is a lot for a Balance One Core that is recommended to support between 1-60 users.

2 Likes

Thanks for chiming in.
I am also surprised to say that currently CPU usage is very low, and TLS clients are doing good, apart from a slight echo on some softphones. This probabily means that active clients on the Internet are low and requested resources over the Internet are small. For instance, concurrent video playback is very rare, software updates run only at night (and most apple/icloud updates are being blocked), plus we have a local caching server for Chromebook updates… Don’t want either to throw away money, but at the same time wish to keep performance at it’s best
We have 5 WANS (70/20MB each) running and never reach the bandwidth saturation.
BTW there’s no logging facility to trace CPU usage over time, or do I miss something?

Echo on the softphones is not a router limitations. Softphones generally work poorly. I’ve tried many times!

Ultimately no way to know if you need a router upgrade until you try. Its not the number of devices. its how much data each of them needs to pull. How much internet bandwidth do you have? The devices can’t move more data than you have coming into the building.

Do the servers move a lot of data out to the client devices?

You have a lot of people working through a possible single point of failure. You could have a second identical Balance in high availability mode. HA mode brings complications. I keep a second device and have configuration backups so I could have it running in minutes. In terms of cost you need to consider (1) more Balance One, or (2) of whatever device you decide to use.

1 Like

Ultimately no way to know if you need a router upgrade until you try. Its not the number of devices. its how much data each of them needs to pull. How much internet bandwidth do you have?

For now we have 5 WANs (effective 70/20) on the Balance One so that’s a potential 350/100 (non aggregated). I Have never seen all WANs pumping data to their limit. Unfortunately there’s no logging facility on this, but I have never experienced network clogs when using the Internet, maybe this happened just 2/3 times in 4 years.

The servers do not move a lot of data.

Our backup solution is an EdgeMax Router, which was used each time the Peplink was toasted by lightning strikes coming in from modems (happened twice already)

FYI: I regularly move a 15GB file from one LAN side device to another. This causes a Surf SOHO to run at 100% cpu. Not sure if this would be true for the Balance One too, but just be aware that moving lots of LAN data can also impact cpu usage.

1 Like

So basically, since the CPU is always between 10% and 25% there should be no worry? Can we assume the CPU usage will define when it is time to move to a higher model?

That’s avoidable. Last Fall I wrote this document – http://download.peplink.com/resources/2020-2011_safe_instructions.pdf. Page 12 et seq talk about such issues. In particular, high quality UPSs, panel-based surge suppressors and “light gaps” are good ideas. The cost to implement is often greatly less than replacing equipment. We have never found it necessary to keep an extra router on the shelf when proper protective measures are employed.

3 Likes

@Rick-DC , we had already quality UPS and surge suppressors but the way in was on phone cables. Now that we have setup “light gaps” it should be over.
Your thoughts on this topic?

I have a lot of thoughts about that – that’s why I wrote what I did. ;<) ;<)

Seriously … In the one photo you see a cable modem with a media converter alongside it. There’s also a lightning suppressor outside connected to a low impedance ground system. My experience is that the latter may or may not work. Regardless, your total exposure to failure should be the cable modem and the media converter. Incidentally, it’s not an accident that the cable modem is not placed in the nearby 7’ rack with the rest of the networking equipment. We don’t want to take the slightest chance of losing a high value component such as a router or network switch. So the network connection between “the outside” and the rack is only via fiber.

1 Like

How about your thoughts about the current topic, that is the potentially overloaded router

Hi. I agree with what @Michael234, @Erik_B & @Don_Ferrario said. When looking at all you have behind your B1 Core it does look like a lot. But the best test is, as suggested, a look at CPU usage and throughput. Jitter is something to watch, you have suggested, while most likely to be an artifact of WAN quality, will also be seen as your router approaches its hardware limits.

While I don’t think I have ever seen quick that much “stuff” behind a B1, “if it works, it works.”

Added: We had a situation about a year ago sorta similar to yours. The owner had a Balance One Core and we upgraded it to a Balance 210. The B1C was doing fine but when the B210 was installed the owner appreciated the much more snappy GUI response, faster boot time, considerably faster PepVPN throughput, etc. Since, they’ve gone on to add HA mode. So, while the B1C worked fine in a technical sense, they appreciate the upgrade.

1 Like

Today I have finally witnessed a 100%CPU load on the router, while a computer was downloading from the Internet. Strange to say, download bandwidth was about 100Mb/s, out of 5 WANs (each 70Mb/s download capacity), however the 120 clients connected in LAN, might have generated some traffic on the network, which in turn contributed in spiking CPU.

As a sidenote, I have fired our SIP/TLS app the verify voice quality while the Balance One was under stress, and there was no glitch at all.

Your thoughts?

I have not spent much time on this, but my experience has been that LAN to LAN data transfers show up as CPU usage period. That is, they are not visible on any bandwidth report so, somewhat invisible. Should be easy to verify.

If some of the CPU usage is LAN to LAN, I have no idea how to trace down where. If you knew where, then a switch external to the router would probably keep the LAN side traffic away from the router.

Also note that the specifications vis-a-vs throughput for the Balance One are significantly reduced when five WANs are in use, as is your situation. Your use of this product with so many clients still makes me nervous. But, ultimately, “if it works, it works.”

1 Like

It looks like the only upgrade device which supports 5 WANs is the 510… Huge cost difference! Any idea?

It looks like the only upgrade device which supports 5 WANs is the 510… Huge cost difference! Any idea?

Maybe @MartinLangmaid can throw two cents on this?

In my opinion, the only traditionally sensible/simple upgrade is actually the SDX (future proofing / hardware platform capability etc) but that costs more than the B580.

If you’re restricted by budget then you’ll need to get creative. If it was me, I’d buy a B310X and then physically segregate the networks to have two gateways, the Balance One managing the slower WAN links for use by your low bandwidth use cases (VoIP / guest wifi etc) and then the B310 for everything else that matters.

2 Likes

Nice thoughts,
we could look for a second hand 580 and keep the Balance One for low priority stuff/backup unit.

Strange to notice no other brand offers so many multi WAN models as Peplink does. You get dual WAN models at most.

Is this due to the SD-WAN market, which is growing and redefining architectures, including old-school multi WAN approaches?

Its because no one has been building multi-wan routers for as long as Peplink or understands it as well as they do. And internet speeds are catching up too. In places I would have used a B580 in in the past (for bonding 5 LTE connections or 4 DSLs) I’d now deploy a Dual WAN solution because they can generally get 70Mbps on fiber and 50-100mbps on LTE-A.

Most other vendors only really saw dual wan as a requirement for failover- then added some support for load balancing later, almost as an afterthought. Peplink did it from the start.

1 Like