Balance 20: "WAN1 Disconnected (WAN Failed DNS test)" [RESOLVED]

  1. FWIW, when I switch modems from one WAN port to the other, the issue moves with the modem, so it’s likely not an AT&T problem and likely not a WAN port problem, hence my thinking that it is more likely Comcast related;
  2. Using 8.8.8.8 and 8.8.4.4 instead of the Comcast DNS servers, and have been since before this whole problem arose;
  3. That is a good question, but I don’t think so as I am getting multiple disconnects in the middle of the night when little to no bandwidth is being consumed;
  4. I will try MTR when I have a moment

It is starting to sound like we have similar symptoms but possibly different root causes. I am going to do one last check of all the coax connections at my location before I get a Comcast tech out here …

One more thing, I’ve increased Health Check Retries from 3 to 5


I tried that too, including PING versus DNS, longer timeout period, longer interval, more retries, etc … all to no avail unfortunately.

Just had 2x ATT disconnects this morning.

1- One of them I’ve briefly registered 100% CPU load on Peplink. Why there is no historical data on CPU load?

2- Are you using InControl2? Again, it is in realm of non-scientific data, but I’ve closed my InControl2 real-time bandwidth monitor after 2x disconnects to calm down the situation.

Well… after an uneventful hour, I’ve opened InControl2 dashboard and clicked on couple links. Boooom!!! ATT line went off. After dealing that long with this issue my judgement might be questionable, but my perception tells me it was not a coincidence. I will try it again in an hour.

I am not using InControl2, but after re-tightening all of the coax connections leading to the cable modem, I have had zero disconnects in the last 21 hours, which is by no means a record, but at least a good start …

For the sake of others looking for CPU load and other performance data on PepLink, you can enable SNMP v2c on the device and then you can collect OID: deviceCpuLoad and other data

Hi,

WAN health check feature is just simple connection test tools to verify internet connectivity over the WAN connection. WAN health check failed mean the traffics sent from the WAN interface doesn’t get replied/responded. You can actually find the heath check stability - consecutive count by accessing to the support.cgi page.


Health check failure can be caused by the following:

1. Communication between Balance router and ISP router.

  • ISP Router/Modem Hang issue
  • Physical/Port Speeds (Auto Negotiation for interface/Port Speeds)

2. Communication between ISP router and Health check target.

  • ISP service down
  • ISP routing issue

3. Unreliable Health check target.

  • Make sure health check target is not block the traffics.
  • Make sure health check target is reliable

For more information, please refer to the attached diagram:


**For WAN health check failure troubleshooting **, usually you will need to isolate the possible issue that cause by the items 1,2,3 above.

  • Make sure physical connection is fine
  • Make sure interface/port speeds is defined for both end devices. This will isolate the auto negotiation issue.
  • Disable WAN health check and monitor the internet connection status. If disabling health check, you also facing internet connection issue, this shown Internet is unstable.
  • Put a host in between the WAN interface & ISP modem for isolation test.
  • Changing using reliable health check Targets/Servers
  • Others

Below are the sample test you can use to isolate the item 1 for the communication between Balance router and ISP router.


Thank You

4 Likes

sitloongs,

Thank you for sharing this with us, in particular support.cgi page on PepLink that make a bit easier to troubleshoot. BTW, are all parameters on SUPPORT.CGI page available via SNMP, specifically Health Check history?

Another question, does Balance 20 have a cap on NAT entries?

I think I am getting closee in my ghost hunting and I will share with you even the ghost name once I am done.

We do have OID for WAN Health Check State - .1.3.6.1.4.1.23695.2.1.2.1.4

Are you referring how many Port Forwarding rule can be defined? If so, we don’t cap on this.

1 Like

I was referring to PepLink’s NAT handling egress traffic LAN->WAN. Is there a limit on how many simultaneous NAT sessions PepLink can support?

Thanks

We support 7,500 concurrent sessions for Balance 20.

1 Like

Is there a way (via WebUI or SNMP) to check the concurrent session number in real-time?

So here are the symptoms: Number of users in our office fluctuates during the weekdays. When max number of users are in the office and they send traffic during peak hours of the morning, PepLink drops connection on ATT Uverse SMB fake fiber line… fairly consistently.

Diagnosis: We have two ISPs: Comcast and ATT Fake Fiber (both 100/20 Mbps). Browsing ATT modem configuration, I came across NAT table page


… and fairly quickly I’ve realized correlation between NAT table entries (concurrent sessions) and PepLink ISP HealthCheck status


Now, using HTTPerf tool I can easily reproduce the problem by overwhelming ATT line with opening sessions and triggering PepLink HealtCheck failure. Per PepLink team recommendation (somewhere early in this tread), I changed network configuration and placed a managed switch between PepLink and ATT modem. Once HealthCheck was configured to ping the directly connected switch, HealthCheck kept ATT line up even when the link went completed saturated with my session flood.

In conclusion, ATT SMB Uverse modem has limit of 2000 NAT entries, and ATT support confirmed that this settings can not be altered or removed. So using Outgoing Rules we are managing to keep sessions lower over ATT link and routes bulk of them over Comcast.

It is not a solution, it is just a work around. I did run the session flood loader over Comcast, and to my surprise it triggered HealthCheck failure on both ISP: Comcast and ATT.

Your mileage may vary…

The only way is downloading Diagnostic Report when the problem occurs. Tech support will help to check the concurrent session that captured in the Diagnostic Report.

Please do me favor below when problem occurs.

  • Download Diagnostic Report.
  • Provide screenshot of both WANs usage (Status > Real-Time).

Thank you.

1 Like

Here is a capture one of the occasions this morning



Where should I upload Diagnostic report?

Thanks

Please open ticket to submit the screeshots and Diagnostic Report.

Thank you.

1 Like

Having the same issue on a balance 305 - ATT + Comcast
ATT disconnects.

Were you able to find a solution?
@ABC-admin
@TK_Liew

@ABC-admin’s Balance 20 is overwhelmed. Suggested to upgrade the hardware. @TechZo, I would suggest getting help from your point of purchase to further isolate the problem. Normally, WAN failed health check is caused by the WAN connection where Balance router failed to contact the health check target.

1 Like

I have a similar issue, out of the blue my balance1350 8.1.1 build 5006 — will lose WAN 1 and WAN 2 health check, in In Control 2 , but I can ping the WAN1 port 100% - and it is 100% the correct port confirmed - this seems really odd. almost like after a over the air update it lost its mind - we have powered cycled everything and its still in disconnect sate but ping able - is there a way to remotely reconfig the 1350?