Surf SOHO MK3 never finishes connecting to cable modem if modem reboots


#1

Hello Everyone,
I have had a Surf MK3 a bit over a month now and I am very happy with it, except for one thing: if the modem reboots or boots after the Surf is up then it never finishes connecting to it. It stays there saying “Connecting…” with a spinner. Exact details below. If this happens, then the only way out of this is to disconnect the WAN ethernet cable, reboot the MK3 (hard reboot/power cycle, in most instances a soft reboot is not enough) and reboot the cable modem separately and when everyone is fully up, reconnect the WAN cable.
I have spent countless hours experimenting with all the settings, reading forum articles and following advice given there (for example setting the port speed to 1000Mbps without advertise speed) and yet without success. Even tried putting a switch in between the cable modem and the Surf. At this point I don’t think this is a port negotiation/physical layer issue. Occasionally it stops happening for 2 or 3 tries but after all these tests, I am convinced these are just lucky timing flukes for sure.
At first I had an Arris SB6121. It appears there were many people complaining about it, so I went to an Arris 6183 which was said to work well by a user… same problem. Then I discovered Arris/Motorola were said to have problems with Peplink, so I bought yet another modem, a NetGear CM-500 which was listed as working well in a few places, and … same problem. It is very frustrating because each time Comcast has a small interruption, the modem restarts and my Surf gets hosed and I have to physically take care of it… until then the whole small office network is down… See the thread where I initially reported the problem couple days ago. I am starting this one though because it is modem independent: SB6121 Cable Modem
If the connection gets made, it is extremely stable until the modem reboots, which could be weeks, days or hours… you know Comcast… :slight_smile:

Am I the only one experiencing this problem? I can’t resolve myself to think that the Surf would be incompatible with more than half of the cable modems out there, is it? Technical details below.

At this point I would love to see a software fix (I feels like it should be fixable in software) but I would be happy with any settings/workaround in the meantime that would make it so that I don’t have to do something every time Comcast cycles… Any ideas?

Thanks.

Technical details:
Surf MK3, FW 7.1.0 build 1284
Modems: Arris SB6121 and SB6183, NetGear CM-500
Cable Operator: Comcast

The key thing which I observed which for me could help someone who knows what the Surf is looking for understand what is going on is that, when the modem reboots it connects to the Surf without any problems initially before the link to the internet is up. I am not sure why they do that but they do offer an IP address before there is connection to the internet. As soon as the link to the internet is up however, the connection status changes to “Connecting…” and that is it, forever… It seems that something happens on the modem side when the link comes up that confuses the Surf in an unrecoverable way.
From the outside it looks like something that could be fixed in software even if there are some low level reasons involved. I am wondering if this could also be specific to the modem firmware ran by Comcast? I find it odd that all three modems would have exactly the same behavior and chain of events to get there. The Comcast firmware could explain some of it?

Here are the details of the WAN states observed when the modem reboots:
Cable not connected
Connecting…
Obtaining IP address…
Checking Connectivity…
DNS Check Failed – At this point the internet link is still not up, so no surprise
– Link goes up
Connecting… – never leaves that states
(Obtaining IP address… sometimes shows up once briefly too and then goes back to Connecting…)

After that point the reboot procedure mentioned above is the only recovery path, unplugging the WAN cable is paramount to ensure the Surf and the modem don’t talk before the internet link is up. I have never seen it fail that way, even after dozens of tests. This must be something that can be leveraged for debugging the issue by the Peplink team I would think.

I would be happy to give more details if needed.


SB6121 Cable Modem
#2

We have seen this issue many times with the Balance 20 and SOHO routers and Arris modems. We know some of the modems have “issues” but we are convinced the Peplinks do as well.


#3

I am wondering why there aren’t more reports about this? Is it because most Peplink customers use professional grade equipment for their internet connection instead of retail modems, or are they connecting the Peplinks behind an edge router which would shield them from that?
And it doesn’t seem to be a hardware issue per say because it happens behind a switch as I mentioned, isolating the SOHO form the modem connectivity-wise, making the issue a protocol/handshake one it would seem.
Oh and putting the Surf behind another NAT router fixes the problem, and the other router does not have a problem at all with any those modems. But it is impractical to have two NAT stacked up just to shield the Surf from the modem, it is also bad performance-wise. And while it works with that old cheapo router I tried, I would have to invest in good up to date router just to put in front if I wanted to do it right, defeating the purpose of having a Peplink at all…
I hope someone will have a magic answer :slight_smile:


#4

Because it is not a known or widespread issue. How often is your CM rebooting and why?


#5

Hi @Tim_S. That’s a really good question and we’ve wondered the very same thing. We have found the problem to exist when the modem reboots but there is another issue – the Peplink <–> modem connection sometimes just “breaks.” We have not been able to correlate this with any other activities. We’ve tried several modems but the Arris modems are the worst – particularly 6121 and 6141.

Here’s another data point: We have several Balance routers in the field, all in austere environments, where IOGear GWU627 wi-fi/N adapters are used as a bridge between 4G hotspots and Balance routers – like this: Sprint hotspot <–> GWU627 <–> Balance20. (Yes, I know this is not the ideal solution but it’s what we have to work with.) The ethernet connection between the IOGear and the Balance sometimes inexplicably “breaks.” Often it is difficult to re-establish – just like with the cable modems. Sometimes it will work again without intervention but more commonly someone has to visit the site and play with it some.

Pretty wierd.


#6

Hi @Rick-DC I agree it is very weird and makes troubleshooting difficult and frustrating!


#7

Hi Tim,
Nice to have someone from the Peplink Team take interest.
First to answer your question: I should have been clearer the modem doesn’t spontaneously reboot. Comcast has quite a few outages in my area. It could go for weeks without one and then in a single week there can be 3 or 4 outages lasting from 10 minutes to 2 hours. When the cable comes back the cable modem goes through a sequence very similar to when it boots up after power cycling or rebooting. But technically it does not reboot. I hope this clarifies. As far as I am concerned none of the modems I have tried have any stability issues in my environment, and unless Comcast has an outage in my area (affecting all customers in that area), they don’t drop the connection (to either the Surf or the internet), and everything is very stable. Very good SNR, and dBmV levels.

Going back to the problem, today, I did more experiments and I got some kind of a breakthrough in understanding what is going on.
I was suspecting that the fact the CM offers a local DHCP before the internet is up was causing the problem. So I took the last IP, gateway, mask and DNS servers that were assigned to me and set those as static IP settings in the WAN settings, just to see what would happen. Then I power cycled the CM and it went up as usual, offering a WAN link before the internet up-link was established, then the Surf tried to do a health check and failed as expected, then the CM got the internet up-link, and a few seconds later the health check passed and success: connected… After that I can even change to DHCP and it cycles the WAN connection, get an IP and everything works. So it seems that local DHCP server is messing up the Surf.
When starting the modem with the WAN set to DHCP on the Surf, it gets a first IP in the range of 192.168.100.x before internet is up, fails that health check as it should, but for some reasons when the internet comes up, the Surf fails to re-acquire a new IP address from Comcast’s DHCP server and stands there on “connecting”. I don’t have a packet/protocol analyzer to capture a trace unfortunately, but somehow the Surf fails to acknowledge the change in the interface when CM switches to online mode, or it fails to broadcast a new DHCP request, or it misses the response, either or all of those.
The steps above are 100% repeatable. And it looks like this should be a simple fix for someone with the right tools to understand exactly at what stage the Surf needs to retry.

Sadly, this understanding of the situation does nothing to fix the issue, as using the static IP from my previous DHCP lease is not possible overtime, so I have to set DHCP.

While this does not solve my problem, it can maybe explain why this issue is not known or widespread as you said. For this to happen and be reported you need to be in the following situation:

  1. No static IP address (WAN set to DHCP)
  2. A cable modem that offers a local DHCP address prior to internet being available (all residential Comcast compatible modems I have seen seem to do that for reasons that escape me, what good does it do???)
  3. Enough Comcast outages to notice the situation and report it vs ignoring it
    If you have a static address then the issue does not happen, and I suspect most of your customers are from reasonable size companies and thus have a static IP for their connection. Or at least they may have a business grade CM which does not have that corky useless DHCP server.

Any idea about what can be done? I really think someone should look at this because, while it might be an uncommon situation for your customer base, it is systemic and fixing the issue would contribute to making your products better (and it is probably not hard to fix; all my other routers have no issue with this). I am opening a support ticket in addition to this post.

In the meantime, if anyone has an idea for a workaround to deal with this double DHCP cycle confusing the Surf, it would be great. If there is a modem that is good and stable that doesn’t have this local DHCP server, I could also change one more time.

Cheers!


#8

@peparn
Found that you had opened a support ticket on this. Let’s work this out using support ticket. We need to investigate from the device in order to identify the issue. In most of the case, the issue can be related to the port auto negotiation that is not compatible to each other. Let see what we can do for this.

@Rick-DC, can you also submit a ticket for us to further check as well ?


#9

Thanks @sitloongs for the follow up. I have responded in details to the ticket via email. I will post the outcome here if we get to a resolution.
Thanks.


#10

Hi. OK. Done. TU! Ticket 785172.


#12

Just as an FYI, the problem is not limited to Comcast. I have experienced this on two different Surf SOHOs in different locations both using Spectrum/Time Warner.

Great debugging, by the way.


#13

@Michael234

Would you able to open support ticket for us to check as well ? Sad to say most of the SOHO opened ticket are end customer, not much info they can provide.

We are working with @peparn and @Rick-DC :+1::+1::+1::+1: … they are very good to help to perform test and collect info from the field


#14

Happy to help :blush:, this is great team work, and working with @sitloongs is a pleasure.
I am new to the Peplink community, but so far I am impressed by the reactivity of their support team and dynamism of the community. Plus the Peplink devices are feature-packed at all price points.
The issue we are working on here seems to just be a symptom caused by more profound issues that could be serious, so it is well worth the time to get to the bottom of this, if it makes the Peplink devices better and more secure for everyone.
Cheers!


#15

Just wanted to weigh in here. First, @peparn thank you for posting this because you’ve thoroughly replicated the exact issue I’m also facing and you saved me a significant amount of time troubleshooting.

Technical details:
Surf SOHO MK3, Firmware 7.1.0 build 1284
Modem: Arris TM1602
Cable Operator: Optimum Online (no static IP - dynamic only)

I can also confirm that connecting an old ASUS RT-NT66U router (using Asuswrt-Merlin firmware) between the cable modem and the SOHO resolves this issue. Of course, this is not ideal from a network architecture perspective.

The only workaround I’ve found that allows me to connect the SOHO directly to the cable modem - and have it be able to recover if the cable modem cycles - is to use the following settings:

Port Speed: 1000mpbs Full Duplex with “Advertise Speed” unchecked
MTU: Auto
Health Check Method: Disabled

Disabling the health check obviously defeats the ability to use fallback WAN connections. But besides the occasional cycle, Optimum has been very reliable so I’m willing to made this trade-off for now.

I do want to add that this is definitely an issue with the SOHO itself. Arris cable modems may have quirks of their own but the SOHO should be resilient enough to deal with any device it’s connected to. And it seems this is issue is not specific to Arris cable modems regardless. It’s also likely affecting more users than Pepwave realizes. Many might not have the patience or proclivity to troubleshoot something like this (and take the time to read/post on forums) and will instead throw up their hands and move on to another brand.

Hopefully Pepwave has been given enough ammo now to figure this out once and for all. Would greatly appreciate any updates on progress/resolution. The SOHO is an amazing device, and I’m pleased to see an active forum with involvement from the support team.


#16

The issue of fully reliable cable modem connections to the WAN ports of SOHOs and Balance 20s, in particular, has been discussed numerous times – quite a number of threads. And, Peplink engineers have spent a l-o-t of time on this. I join others in sending my thanks for Peplink’s involvement and dedication trying to understand what’s happening and working toward a resolution.

I wonder if this would be helpful: We have three cable modems that have had problems with Peplink routers. One of them, an Arris 6141, is at my present location. When the carrier, TIme-Warmer/Spectrum, replaced it with one of their own health checks started passing again. (The new modem is a Technicolor DPC3216 DOCSIS 3.0 – and is working fine.)

An offer: If it would be helpful, I will be pleased to send this modem to Peplink’s technical organization anywhere in the USA if there is a desire to check for issues in a lab environment. We won’t want it back. If this would be useful, I’d ask someone from Peplink to PM or e-mail me shipping address and I’ll send it off for evaluation right away.

Rick


#17

I’m curious if you guys have tried setting static DNS servers instead of the default obtain automatically. Try setting them to 8.8.4.4 and 8.8.8.8 and see how it behaves. Thanks


#18

Hi Tim. There have been two issues. One is the WAN being marked as unhealthy when we believe it is OK. We’ve used the following DNS: 1.1.1.1, 8.8.4.4, 8.8.8.8, 9.9.9.9, 208.67.220.220, 208.67.222.222 – as well as the various carriers’ DNS servers. No difference in results that we can discern, except we have less confidence in the servers run by the ISPs than we do Google, Quad9, OpenDNS, etc. And, we have more confidence in DNS checks than via ICMP.

The greater issue, in our view, is the one raised in a number of threads – not that apparently healthy WANs fail checks, but that the SOHO and Balance 20s (maybe more Peplink products – these are the ones we’ve had the most difficulty with) seem to find it difficult to connect to – and REMAIN connected to – certain devices – particularly Arris modems and the IOGear GWU627 wi-fi/Ethernet adapter. The behavior is described in other threads. Sometimes the router will connect to the modem/adapter; often not. Sometimes a reboot of one device will fix the problem; sometimes not. Sometimes a dumb switch placed between the devices will “fix” the problem; sometimes not. We’ve found the Peplinks and the modems/IOGear will reconnect on their own; sometimes not. And why would to devices “talking Ethernet to each other” suddenly disconnect and not reconnect for seconds, minutes, hours or days?) We’ve had one WAN out of service for as long as two months because the installation was in a location difficult to get to and there was no possibility of human intervention. (Everything in all locations runs on a sine wave UPS.) It was in the latter contact that I offered a modem for evaluation with which we had difficulties connecting – not failing health checks – merely remaining reliably connected to a Balance 20.

So, it is the MODEM and wi-fi/ethernet adapter that’s’ the problem? Or, is it the B20 and SOHO? Frankly, we don’t know. Maybe the Peplinks are not quite “to spec” vis-a-vis the Ethernet stack/protocol? We have one situation, previously described, where the modem was replaced with a Technicolor and the issues seems to have been resolved. Conclusion: It was the modem that was the issue. We have another location where we tried THREE modems and each one had an issue. Conclusion: The SOHO had a problem.

Side note: Our ticket 785172 is still open and @sitloongs is monitoring the connection between IOGear and a B20. (Since we opened the ticket we’ve seen one disconnection – the B20 and IOGear seem to be on "good behavior right now. ;<) ) But why did it happen? Dunno.


#19

Thanks for the response Rick, hopefully our guys can help get to the bottom of this!


#20

I would like to chime in briefly and say there are multiple issues stacking up in this thread. Issues that probably have very different root causes based on every individual setup. So I think it is important that people open individual tickets to allow support to properly classify and tally the problems, and give a clear picture of each of those problems to the engineering team so that they can make progress. Reading all the cases above, I think it is too easy for all these problems to appear to be the same based on somewhat symptomatic descriptions, while they are not.

In my case for example, there doesn’t seem to be any stability issues with the connection between the SOHO and the Modem. After many hours of low level diagnostics and packet inspections, I believe we have figured out the source of my problem, which stems from a leak of LAN ARP broadcasts onto the WAN port under certain circumstances which were identified. I will not expand further at this time as we are waiting for confirmation from the the Engineering team. @sitloongs or myself will post an update here with more details (including potential short term mitigations/workarounds) when it is fully confirmed by Engineering.

As said before, I am in awe in front of how well support and engineering work at Peplink. Yes, there are issues, but instead of denying them and giving you the runaround, they are embracing them and taking the opportunity to improve their product. And it starts with the built-in diagnostics tools :slight_smile: Any similar issue with most other brands would have resulted in frustrating interactions, ending at a wall that blames the customer for the problems and no solution in the long term… So kudos Peplink!


#21

Just another sample point:

Setup:
Balance 380 HW6, 7.1.0 build 2287
WAN 1 connect to a Netgear CM700 cable modem to Comcast
[WAN 2 connected to a sonic.com [recommended] DSL modem]

Replicable experience:

  1. Unplug WAN 1 (to the cable modem) (or reboot the B380).
    Observe: Connection goes down (duh!)
  2. Plug it back in again.
    Observe: Connection comes back up, the same IP address (“IP1”) is assigned from Comcast. All is well.
  3. Unplug WAN 1, plug the cable modem into WAN 3.
    Observe: Connection WAN 1 goes down, the connection at WAN 3 is stuck at “connecting…”
  4. Reboot (power down, then up) the cable modem
    Observe: WAN 3 comes up, with a different IP address (“IP2”) from IP1 is assigned from Comcast.
  5. (Just for fun:) Unplug and replug back in WAN 3.
    Observe: Connection comes back up, same (WAN 3) IP address - IP2 - assigned.
  6. Unplug from WAN 3, replug it into WAN 1.
    Observe: Connection WAN 3 goes down (of course), the connection at WAN 1 is stuck at “connecting…”
  7. Reboot (power down, then up) the cable modem
    Observe: WAN 1 comes up, with the original WAN 1 IP address assigned from Comcast. All is well.

As a side note: When the cable modem is rebooted the WAN connection has a brief interlude with a 192.xxx.xxx.xxx address assigned, then drops that before reconnecting with a public address.

Conjecture:
Comcast/the modem is keyed on the MAC address of the router port to which it is connected. A change in MAC address requires a modem reboot to get a workable IP address to the (new) port.

As a PS:
This is the minimal demonstration of the problem - moving the cable modem connection from one WAN port to another.
Moving the connection to a different device (e.g., a WAN port of a different router) exhibits the same problematic behavior.

Work-around:
None really needed - the problem occurs only with MAC changes, i.e., when the ethernet cable is moved from one port (or device) to another, at which point one has to remember to reboot the modem.

FWIW: We have plugged the cable modem into an IoT power plug, and can use the other WAN connection to reboot (power down and up) the cable modem in case one forgets the modem reboot step (such things happen).