Hello,
I have a Surf SOHO Mk3. I have a love & hate relationship with it, but I think we’ve found this week-end and to get along and perhaps extend that relationship a little longer than expected.
SITUATION
- FW 8.3.0.
- Frequent lock ups. Unable to reach MANGA front-end, nor SSH console. Was better on 8.1 but I needed the UDP relay, hence 8.3.0.
- Router would frequently fail to establish uplink on restart
- MANGA not responding so unable to diagnose, reboot twice or thrice would usually do the trick… the wife and kids frustration had to be managed, too, understandably.
TL;DR
Disclaimer:
- the following worked for me. Do not blindly apply this configuration has it has impacts on functionality.
- I expect most readers in this forum are somewhat versed in network tech and can RTFM.
1- head to MANGA/support.cgi - disable DPI (this greatly decrease QoS and Content Blocking capability, but at least the darn thing is stable), enable watchdog (why disabled?!)
2-head to MANGA/network.cgi → Network → [LAN] Network Settings → DNS Proxy Settings → Uncheck “Enable”, Uncheck “DNS Caching” (this may increase DNS latency so if you have a cell backup WAN link, you might want to keep the proxy enabled and start with step 1 and see how it goes)
3- Apply changes. Wait. Cross-fingers it doesn’t crash due to CPU overload. If it does, reboot and try again.
4- Profit.
THE NOVEL
I’ve struggled a lot with my Surf SOHO Mk3 in the last months. When you read some comments published out there about the unit, it got some bad press from many due to frequent crashes/network drop with recent firmwares. During the pandemic, I had replaced my Surf SOHO Mk3 from main house/office router to access point as it would frequently drop during Zoom calls. Before I went that way, I had done a full diagnostics on wires and NIC stabilities. It was puzzling me because it was remarkably stable for years! It started to be unstable towards FW 7 maybe? Before that, I loved the unit and could do with its throughput limit. But then it was somewhat becoming unbearable; luckily we received a consumer router as a gift from a relative and it worked remarkably well so, so the robust commercial-grade Surf SOHO was relegated to Access Point duty. How ironic.
As access point, the network traffic on the SURF SOHo was lower and it would less frequently crash… until the load started to increase on the access point again. Same story as before. Last saturday, I was thoroughly pissed at the situation (cumulation of everyone’s frustration) and decided to get to the bottom of it. (interlude, I’m a software developer since the end of the 90s, I’ve coded embedded systems and know a thing or two on debugging, this helps.)
I spent the whole saturday rethinking the network topology/VLANs/subnets and what not to segregate what traffic was video streaming, what machines needed in same subnet (Bonjour/mDNS fun) video conferencing… while trying to keep things somewhat simple. I’ve manage to reduce the load on the SOHO again.
But the SURF SOHO remained flaky. Less flaky. But still flaky. I was looking for a replacement… but the Surf SOHO model isn’t ready… almost there…not yet. Part of me hesitating, considering alternatives… But there MUST be reason why it suddenly becomes flaky. Let’s dig in the stack.
I noticed MANGA would get unusually slow… to… not responding. The router would still respond to ping, but everything else would appear dead. Connecting to SSH would be impossible. I suspected the system to enter memory thrashing condition until eventual lockup. Fancy firewall rules? The firewall rules are not over the top (about 10-15 entries?) so that can’t possibly be that…frankly, if my old DD-WRT router of 2010 could do it… ya know…
So, how to decrease CPU usage otherwise? So what else may consume memory? VPN obviously… but I don’t have any VPN connection so not much possible there beyond killing the VPN daemon… but we can’t since SSH is controlled. And yeah it’s probably busybox-based as most routers, but the capability to just disable the VPN and/or SpeedFusion daemons isn’t accessible.
OK what else?
Anything with caching? DNS Proxy! Disable the damn thing which was enabled. Oh wow, that helped, more than I expected. A few years ago, a page would be made of about 10 URL sources or so. Now? Facebook, Google, Microsoft, Apple URLs. Other ad services on top. CDN sites. The AWS API server URLs. All those cell phone apps trying to call home… all the chrome tabs… the enhanced experience services in Windows and macOS… that’s a lot of entries, all things considered.
So that was better, but not there yet. The throughput was bad, even after the VLAN segregation and QoS was done. Went to support.cgi page to review the settings, what’s DPI? Why does my router has Dot Per Inch setting? Oh Deep Packet Inspection. Oh. Like Suricata CPU-killer type thing? Turned it off.
- Throughput is back;
- Stability is back;
- I still have some QoS going on it seems;
- MANGA has never been that fast over long periods;
- I even reconfigured reboots from overnight to weekly!
Frankly, in retrospective, I don’t think the poor SURF SOHO has the CPU power to do DPI. It’s a great security feature but it’s too modern and too CPU intensive for a single CPU system. I think it was a Bad Software Design Decision to allocate that much CPU budget to this (Go on, change my mind!!) The DNS Caching should have a better warning on memory constrained hardware, too. Better yet, delegate this to a separate system, say a container running on your NAS or something… if you can.
Hope this helps others.
If the system starts crashing again, I’ll report back, but so far so good, all systems are nominal.