FWIW, when I switch modems from one WAN port to the other, the issue moves with the modem, so it’s likely not an AT&T problem and likely not a WAN port problem, hence my thinking that it is more likely Comcast related;
Using 8.8.8.8 and 8.8.4.4 instead of the Comcast DNS servers, and have been since before this whole problem arose;
That is a good question, but I don’t think so as I am getting multiple disconnects in the middle of the night when little to no bandwidth is being consumed;
I will try MTR when I have a moment
It is starting to sound like we have similar symptoms but possibly different root causes. I am going to do one last check of all the coax connections at my location before I get a Comcast tech out here …
1- One of them I’ve briefly registered 100% CPU load on Peplink. Why there is no historical data on CPU load?
2- Are you using InControl2? Again, it is in realm of non-scientific data, but I’ve closed my InControl2 real-time bandwidth monitor after 2x disconnects to calm down the situation.
Well… after an uneventful hour, I’ve opened InControl2 dashboard and clicked on couple links. Boooom!!! ATT line went off. After dealing that long with this issue my judgement might be questionable, but my perception tells me it was not a coincidence. I will try it again in an hour.
I am not using InControl2, but after re-tightening all of the coax connections leading to the cable modem, I have had zero disconnects in the last 21 hours, which is by no means a record, but at least a good start …
For the sake of others looking for CPU load and other performance data on PepLink, you can enable SNMP v2c on the device and then you can collect OID: deviceCpuLoad and other data
WAN health check feature is just simple connection test tools to verify internet connectivity over the WAN connection. WAN health check failed mean the traffics sent from the WAN interface doesn’t get replied/responded. You can actually find the heath check stability - consecutive count by accessing to the support.cgi page.
**For WAN health check failure troubleshooting **, usually you will need to isolate the possible issue that cause by the items 1,2,3 above.
Make sure physical connection is fine
Make sure interface/port speeds is defined for both end devices. This will isolate the auto negotiation issue.
Disable WAN health check and monitor the internet connection status. If disabling health check, you also facing internet connection issue, this shown Internet is unstable.
Put a host in between the WAN interface & ISP modem for isolation test.
Changing using reliable health check Targets/Servers
Others
Below are the sample test you can use to isolate the item 1 for the communication between Balance router and ISP router.
Thank you for sharing this with us, in particular support.cgi page on PepLink that make a bit easier to troubleshoot. BTW, are all parameters on SUPPORT.CGI page available via SNMP, specifically Health Check history?
Another question, does Balance 20 have a cap on NAT entries?
I think I am getting closee in my ghost hunting and I will share with you even the ghost name once I am done.
So here are the symptoms: Number of users in our office fluctuates during the weekdays. When max number of users are in the office and they send traffic during peak hours of the morning, PepLink drops connection on ATT Uverse SMB fake fiber line… fairly consistently.
Diagnosis: We have two ISPs: Comcast and ATT Fake Fiber (both 100/20 Mbps). Browsing ATT modem configuration, I came across NAT table page
Now, using HTTPerf tool I can easily reproduce the problem by overwhelming ATT line with opening sessions and triggering PepLink HealtCheck failure. Per PepLink team recommendation (somewhere early in this tread), I changed network configuration and placed a managed switch between PepLink and ATT modem. Once HealthCheck was configured to ping the directly connected switch, HealthCheck kept ATT line up even when the link went completed saturated with my session flood.
In conclusion, ATT SMB Uverse modem has limit of 2000 NAT entries, and ATT support confirmed that this settings can not be altered or removed. So using Outgoing Rules we are managing to keep sessions lower over ATT link and routes bulk of them over Comcast.
It is not a solution, it is just a work around. I did run the session flood loader over Comcast, and to my surprise it triggered HealthCheck failure on both ISP: Comcast and ATT.
The only way is downloading Diagnostic Report when the problem occurs. Tech support will help to check the concurrent session that captured in the Diagnostic Report.
Please do me favor below when problem occurs.
Download Diagnostic Report.
Provide screenshot of both WANs usage (Status > Real-Time).
@ABC-admin’s Balance 20 is overwhelmed. Suggested to upgrade the hardware. @TechZo, I would suggest getting help from your point of purchase to further isolate the problem. Normally, WAN failed health check is caused by the WAN connection where Balance router failed to contact the health check target.
I have a similar issue, out of the blue my balance1350 8.1.1 build 5006 — will lose WAN 1 and WAN 2 health check, in In Control 2 , but I can ping the WAN1 port 100% - and it is 100% the correct port confirmed - this seems really odd. almost like after a over the air update it lost its mind - we have powered cycled everything and its still in disconnect sate but ping able - is there a way to remotely reconfig the 1350?