Starlink firmware updates causing Peplink routers total or partial outage conditions

I’m sharing this information in case it may be helpful to other Peplink vendors.

It’s possible what I’m reporting is essentially the same problem reported previously in the forum in this post, however we have an open ticket with Peplink and are awaiting assistance to positively identify and resolve the problem.

The previous forum post is this one:

Briefly, we have multiple sites with latest-generation Peplink B One 5G routers directly connected to Starlink routers in bridge mode. In some cases they have a 2nd fast WAN connection + cellular backup. In other cases they have only Starlink + a cellular backup.

It took months to figure this out, but it appears that when Starlink pushes out firmware updates, and their equipment reboots, the result is a catastrophic full or partial outage caused by the Peplink router’s inability to identify that no traffic is able to pass from Peplink LAN clients to Starlink WAN connections after the Starlink reboot occurs.

If you have multiple, active WAN connections, the tell is the traffic status reports. Following the Starlink firmware upgrade/reboot, you will see the Starlink router itself (via Starlink app) and the Peplink router appear to be 100% OK (however NOT true). You will also see zero traffic reported for the Starlink WAN connection in the status display…

I’m very tired tonight, so I’m pasting text from the ticket we currently have open with Peplink on this. Hope it sheds some light for other Peplink vendors and end-users.

Brief Summary of the problem to be solved by Peplink Engineering / Support:

  • Note relevant configuration details:
    • Both problems sites have Peplink B One 5G routers.
    • Both problem sites have Starlink WAN routers connected directly to the Peplink router via “Bridge Mode.”
    • Both sites have Peplink <> Starlink integration features enabled, which presumably give the Peplink router additional insight into the status of the attached Starlink equipment.
  • When Starlink executes a firmware update of its equipment, the Starlink router appears to shut down and reboot as part of this process, which is understandable.
  • The problem we are reporting is that when the Starlink routers reboot following a firmware update, the Peplink B One 5G router is no longer able to pass ANY traffic from Peplink LAN clients to Starlink. The result is a total or partial outage condition experienced by the end-user customer.
  • Under normal circumstances we would expect the Peplink router to detect that the Starlink WAN connection is unusable and to take action to disable the Starlink WAN link. This is NOT happening and is a very serious problem.
    • Instead, what we are observing is that the Starlink routers are reporting they are up and running with no errors at all.
    • Following Starlink upgrade reboots, we are also seeing the Peplink routers report no problems whatsoever with the (dysfunctional) Starlink WAN connections. This means the Peplink router IS able to perform DNS lookups (and likely pings) via the Starlink router AFTER the upgrade/reboot has completed – but NO traffic can traverse from LAN to (starlink) WAN. In fact, the only visible indication of a problem from the vantage point of the Peplink router is that traffic status reports will indicate NO traffic on the Starlink WAN port.
  • For sites that have TWO active WAN connections (Starlink + Other), when this problem occurs, the Starlink connection remains in service – which results in reports from end users saying that some websites are not accessible at all (those the Peplink outbound rules are routing to Starlink) and some are accessible (those routed to other WAN by outbound load balancing rules). This is a huge problem.
  • For sites that have only ONE active WAN connection (Starlink) plus a cellular standby WAN connection, the standby cellular connection is never enabled by the Peplink router when the Starlink WAN connection is rendered unusable, so ALL traffic to the Internet is halted. This is also a huge problem.
  • Because Starlink firmware updates are only randomly pushed out, this is not something we can test in a lab.
  • The previously-mentioned forum post indicates there may be a patch available from Starlink to solve this problem. If so we would like Peplink’s help to identify and obtain this patch, as mentioned by Sitloong.
    • However, Peplink needs to come up with SOME way to identify this catastrophic condition (at the network level) and force the Peplink router to take one or more appropriate actions to mitigate the damage, such as:
      • Automatically resetting the Starlink WAN connection (IF that will resolve the problem).
      • Adding a new way to monitor WAN connections in a way that detects whether or not traffic from the LAN side is ABLE to traverse the WAN connection in question. (currently ALL existing monitoring methods are useless for detecting this problem. If I am overlooking something, please let me know.)

Thank you for your attention to this urgent situation.

1 Like

Scott,
My analysis of this started last year April/may (2024) when starlink locked down the user terminal to a single MAC address. If there was any ARP on the network, starlink would then send all of that traffic to that MAC… until there was another ARP, and it would switch back. I pointed out at the time that this would preclude any managed network switches. Starlink was rather indifferent, “their routers work” “we will put in a temporary L3 config, this may disappear in the future”.

If you analyze the arp/mac traffic on starlink it is non RFC compliant, but was stable enough (except for a B20X DHCP LAN->Wan leak that Peplink has since fixed).

Now about 2-3 weeks ago starlink again updated the MAC code to be even more touchy, I don’t think it takes a stray ARP to trigger this condition. My IPv6 handling was clobbered, and yes after a reboot it seems to be a roll of the dice if the starlink terminal returns proper traffic.

You should be able to simulate these conditions with just a reboot, it doesn’t require a software update.

You are not going to find the actual core interaction without a switch between the peplink and starlink with a mirrored port and you capture all of the traffic during a reboot cycle, make sure you capture the ethernet mac.

The random variable seems to be what is the first MAC the starlink sees after it reboots… if it is the peplink, it works… if something else (spaning tree? ndp? ) then it locks onto that.

you can clear the issue with a directed arping from the peplink.

Perhaps peplink will decide to put in a “workaround” for non RFC behavior, but at this time they are probably trying to convince Starlink to adhere to standards.

I have also found that ISPs (especially ones with CGNAT) will intercept PING and DNS probes… I always use an https:// probe which cannot be intercepted.

1 Like

So this is just an issue with their router? I have some deployments with dish direct to peplink. Don’t think I have had this issue. I may have a use case to keep router in line in bridge mode for a future deployment though