SOHO: Lockups & Reboots


#1

My Surf SOHO locks up regularly, 3 time last week, hard power-on fixes it for a bit. I have a support ticket open but in meantime would like to schedule a regular (say 3 a.m.) reboot. Having gone through manuals, I can’t find out whether this is possible and how to do it. Sure I’ve overlooked something - anyone any ideas please? Thanks


Pepwave Surf SOHO MK3 Keeps Dropping
#3

Hi,

We’ll work with you to resolve the issue on the support ticket. However, for the benefit of others reading this, you can schedule a reboot either via InControl or from the device.

From InControl

Go to the group containing the device(s) and click Settings -> Device System Management:-

image

Tick the “Scheduled Reboot - Managed” box and then select the daily / weekly schedule.

You can then use the Device Selection to only apply the reboot to devices with given tags, etc.

From the Device

Go to the LAN IP address and login - you will notice the word “index” appears, delete this and everything to the right and replace with support.cgi - as shown below:-

http://192.168.50.1/cgi-bin/MANGA/support.cgi

Scroll down the page and you’ll be able to schedule a daily / weekly reboot:-

image

image

Hope this helps,

Steve


#4

Thanks Steve - will try this tonight.
Dave


#5

Scheduled reboot works fine - and SOHO has been rock solid for a few weeks (with daily reboot) until tonight when it locked up again, status lights solid. Power off and on worked for 10 minutes then locked up again (no access through LAN). Can’t believe it should be this flaky - 2 lockups today - and am reluctantly considering returning. Any thoughts please?


#6

Hi Dave,

Did you also install the firmware 7.1.1? Here is a link:-

https://download.peplink.com/firmware/br1ac/fw-max_br1mk2_hotspot_sohomk3-7.1.1-build1342.bin

Thanks,

Steve


#7

Hi - sorry for radio silence for so long. Installed 7.1.1, kept daily reboot, all solid - and was just about to update and say all good, when I get that dreaded call “Internet has gone down”. We had 6 or 7 devices running, using 2.4Ghz - and it seems to be when a device joins.

At that point, OpenReach light stays solid amber, can’t connect to the router - either wired or wirelessly and need to turn off and turn on the router; took a couple of goes as well. 2 hours later same thing happened - again when someone turns on their device. Last device - ASUS RT-N66U never had anything like this. Is there any logging at all on the SOHO to work out what happened? Thanks


#8

@SOHO I would love to hear if you ever get a resolution to this problem. I am also curious to hear if others have been silently having these issues, and/or if support and engineering are already aware of these lockups and reboots, and working on a fix for the next release.

With 7.1 we’ve had occasional lockups exactly like yours, and since we went on to 7.1.1 the router has been going down like clockwork every 4d10h ±6h, it never misses its schedule.
When I say it goes down, it does one of three things:

  1. Reboots by itself (that is the best case scenario, because it self recovers),
  2. The AP fails, and all WIFI devices loose access to the internet but remain connected to the router, and wired devices stay connected to the internet,
  3. The AP fails, and all WIFI devices loose access to the internet but remain connected to the router, and wired devices also loose access to the internet.

In case 2, logging in via a wired device shows no traces in the event log, so no hints there. In case 3, can’t access the router from any device. For cases 2 and 3, a manual reboot is required (power cycle). With 7.1 only had case 3 randomly, sometimes within a day of a reboot and sometimes weeks after…
On 7.1.1 it looks awfully like a memory leak of some kind given how regular it is.

We have been able to get by with a daily reboot, but if this is ever understood it would be great so that we can get rid of those daily reboots.
We have tried many things such as removing the cell backup link, using less WIFI SSIDs, less VLANs, in case it was a memory leak of some sort or the SOHO was just doing too many things, but none of this solved the problem or changed the frequency of events… So the daily reboot is the only prevention at this point… sad and mysterious…

I am not opening a ticket because we have a workaround and I don’t have time to follow up these days. Also because the diagnostics report they will want includes the passwords for our networks, certs, machine names, etc… (see this topic), and this is against company policies and I really don’t have time to reset everything on our network (and get a new cert) after sharing this critical information.

Anyhow, I am curious to hear what others have experienced. And indeed, if there are some internal stats or logs we can access, to get a better understanding of these events on our own, it would be great.


#9

@SOHO

Found that you had opened a ticket initially for issue. Would you please followup the ticket ? Support team will work on you for the issue. Investigation need to be done for the device to work on the problem.

@peparn

Sad to heard that you busying with your things so that you can’t actually work much on this. Let see others feedback on this. It’s good that if we can work on the issue together.

I’m SOHO user as well :sweat_smile:, but i don’t encounter the issue as reported. My device up time is 94 days from the last firmware upgrade to 7.1.1.

image


#10

Hello @sitloongs,
yes indeed, let’s see if there is additional feedback coming in for this. Maybe the thread should be renamed by you or @SOHO to something like “SOHO: lockups and reboots” or similar to help draw attention to the discussion? Alternatively another thread could be started to collect feedback.
I am actually jealous of your uptime :wink: . I wish we could even get a fraction of that. On 7.1 we got up to a few weeks once but on 7.1.1 it never went beyond 4d10 ±6h. And now we reboot daily so we get one day :laughing:.
I can see you have only one SSID in your configuration. If you remember from when we worked on a previous case, our configuration calls for multiple VLANS and SSIDs. At this point 1 untagged LAN and 6 VLANS and 5 SSIDs on the AP. I don’t have the bandwidth to do active debugging with you at this time (saddly) but perhaps you could reuse the test configuration we created for our prior case (which was mimicking our configuration) and see if you get any problems with it on 7.1.1. over a multiple day period? The challenge here is that you need devices to connect and use the router for it to be realistic I am guessing. It is really too bad I can’t look inside the internal logs to see if there are any hints there.
Hope this gives some ideas. Cheers!


#11

@peparn

Let me try get back the configuration and see whether i can reproduce the issue :thinking::thinking:


#12

Great, I hope you are successful at reproducing this so that it can be addressed. With a bit of luck it does not depend too much on the volume of traffic and the number of devices connected. Looking forward to your findings.


#13

Having a similar issue with my SOHO. It’s partially frozen three times since installing the 7.1.1 firmware. Otherwise, it had been rock solid for the last 8 months it’s been in use.

Today was one of the freeze ups. Anything wireless completely (almost completely) goes down. I have two SSIDs running, minimal traffic, and they’re on VLANs that have a wired appliance (Fing) along with them. I also have another switch connected, with two raspberry pi’s. The pi’s stayed up the entire time and could access the internet. Even the Fing boxes had a connection back to the cloud through the AT&T router, until I restarted the AT&T router. (Let me clarify, the Fing boxes had a connection out, so they were “connected”, but none of the functions of the Fing worked). After the AT&T router restart, the Fing systems went offline and could not establish a new connection with the server in the cloud.

SOHO went down around 9:30am this morning. Got home around 6:30pm and had to hard power cycle it. The top and bottom of the SOHO typically run mildly warm. Tonight, the top and bottom of the SOHO were hot, so it’s been burning cycles trying to do something.

Not going to open a ticket at this time myself. If anything, I’ll roll back to the earlier firmware and wait until a new version is released.

Thanks.


#14

@peparn

Loaded the config.

I have few client connected to the device.

Monitoring enabled for the device and we will further investigate the issue.


#15

@Nielb

Appreciated you can open a support ticket and allow support team to check on the issue.


#16

@Nielb thank you for taking the time to post here. The issue you had today is exactly like failure case#2 I posted here. Wifi is gone but existing outbound connections for wired devices somehow still work. New connections usually fail. I am glad I am not the only one to have run into this :smile: .
Also I forgot to mention earlier, but like you, when the device was in limbo, I noticed it was warmer than usual. Probably because it was busy looping somewhere in its code…
Maybe we get these issues more regularly because our SOHO is under more load?


#17

@sitloongs you are great! Thanks a lot for working on this; I’ll try to help the best I can. So don’t hesitate to ask if you have any questions.
The problem with this kind of issues is they take a long time to reproduce; in our case 4.5 days more or less, but for @Nielb it seems more random. I would think the more devices you connect and the more traffic you create, the more likely it will happen, sooner than later. I am hoping it is not a low level WIFI protocol bug that could be caused by interference or certain WIFI stacks on certain client devices.
I am crossing my fingers :crossed_fingers: you run into the issue, at least one of the three failure modes.
Finally, I still think it would be a good idea to rename this thread to something like: “SOHO: lockups and reboots” to get more attention and participation from the community.
Cheers!


#18

Hi, I just want to add that I too have had this problem where the router simply locks up and I can’t even connect to it with an ethernet cable. The SSIDs remain broadcasting though… It’s happened maybe 5-10 times since upgrading to 7.1.1. Prior to this, it had been solid since owning it. I have a HW3 Surf SOHO.

I’ve not yet rasied any tickets, I’ve been rebooting as and when required. I’d guess I’m also out of warranty now too, as I’ve had the router for what will be 2 years come Jan/Feb 2019…

Will be a bit disappointed that I’ve got a hardware problem after such a short length of time.


#19

Have just renamed topic to reflect underlying issues. Just to recap on my scenario (and appreciate others’ experiences may differ):

  1. I was having lockup issues with previous firmware (7.1.0) hence the request to schedule reboots and then upgrade to 7.1.1 - I am still doing daily reboots and still on 7.1.1
  2. I’m running a Huawei HG 612 OpenReach box as “modem”, one wired connection to the SOHO + 3 WPA2 SSIDs - 2 x 2.4 + 1 x 5Ghz - all on Auto and unlimited clients
  3. I have a Guest VLAN and an untagged VLAN

What seems to happen is that when a new device connects (total somewhere around 7-10) it sometimes - and only rarely (but twice the other day) locks up. When I say locks up, I mean: WAN light stays solid, other lights stay on, no wireless and no wired connections work either - and obviously not Internet. It’s not predictable, sometimes new devices just connects and work. Not sure that this is a memory leak (unless a very subtle one) as I wouldn’t expect one new connection to behave like that. Only way to restart is hard reboot (have to leave it for a minute or so) and then restart. The other day it went down twice in short succession after working for a few weeks (albeit with daily reboots)

I don’t have an immediately repeatable test case, but if anyone can suggest things to try, then I will give them a go and disable the reboots (in case they were masking problems - although the 2 lockups in short order would suggest it’s not the amount of time the thing was running). I also genuinely have no idea whether this is software or a hardware fault.

Anyway, hope that’s some help in terms of sharing behaviour.


#20

Thanks for sharing more about your issue @SOHO. It is interesting you were able to tie this to devices connecting to the WiFi network. In your case, this would lean towards a bug in the WiFi stack somewhere that would crash the router under certain conditions. On 7.1 when I had those issues randomly I was thinking of that kind of a bug, however in my case it became oddly regular (4d10h ±6h) with 7.1.1, with the exception of occasional random crashes before it reaches that mark (those are probably the random ones I was getting on 7.1 already). I still can’t understand why it would be so regular on 7.1.1 while it was random on 7.1. There might be multiple bugs we are after, one that was there already on 7.1 causing those random crashes, and a new one linked to resource exhaustion (maybe) on 7.1.1, which would explain why it is so regular given our usage patterns.

Regarding things to try:
Since yesterday I have enabled the watchdog (see here) and disabled my daily reboot which was to prevent it from reaching the 4d mark where it would crash and potentially require manual intervention. In theory, with the watchdog activated, it will not require a manual intervention when it crashes, so I can just let it run. Since I have turned on the watchdog, I have had one automatic reboot which may have been one of those random crashes, or the watchdog might be a bit too nervous and may have triggered under load if the router was a bit late servicing the timer… not sure yet. I will continue monitoring and see if it can run for multiple days like before. In any cases, I would rather have more reboots than hangs that require manual interventions…
Hopefully this gets fixed soon though because those reboots disrupt work in the office, and interrupt conference calls etc…

Finally, thank you for agreeing to change the thread topic name, it should help getting the community’s attention. However, for some reasons I don’t see the name change reflected on my side. Was the change saved?


#21

Thanks sitloongs. I’ll open a ticket if it happens again. They’ve had us working crazy hours lately and unfortunately time is something I’m short on at the moment. Although I also like the idea of having a reliable router too:).