Is anyone else experiencing "stuck in updating routes" bug?

I have had a ticket in since January on a recurring bug in 8.0.2. They seem to be having trouble fixing it, and I want to see if there are any more examples, since that might help them in finding it.
Symptom is speedfusion connection stuck in “updating routes”
Happens sometimes on one end, sometimes the other, but by far more common on the hub B710s.
Happens far more frequently on B710s with many connections.
i.e. a B710 with 120 connections may have 2 stuck.
a B710 with 270 connections hovers around 50 stuck, so MUCH higher percentage.
Only way to resolve is a reboot, or disable/enable profile on the end that shows stuck.
Note that additional complexity is that the remote peers have two vpns to two data centers. So route isolation is turned on everywhere.
It happens seemingly at random, but fairly quickly. I installed patched firmware this morning and of course rebooted. within two hours one B710 had 12 stuck connections. now it is three hours latrer and it is at 16. this will gradually creep up to 40-50.

Can I confirm the problem only fixed if you reboot the hub device? Possible to share the ticket number for me to take a closer look?

1 Like

Hi jmpfas

Did you manage to get a fix for this in the end. We have a similar issue, 2 DC with a 1350 at each ~100 br1 and hd2 hanging off each. If route isolation is enabled on an site then that site starts flapping on updating routes and each other site starts seeing quick changes to the routing table. If route isolation is turned off then the site is stable but we have the issue that the DC’s can route through a br1 to each other.

Thanks
James

@james.webster1, seem like you are having different issue. Can you provide a network diagram on this?

Sound like you turned on route isolation at the spoke sites. Am I correct?

1 Like

Hey TK

Hope you are keeping well?

So the image is roughly what we have got (except about 90 remote sites). The firmware across the network is mixed and the 2 older head office sites are due to be de-comissioned in a few months after the customer migrates their DC’s to azure.

In the image if the remote sites are all set to have router isolation disabled then it works fine but the transport and head office can route to each other via the remote sites. If I turn on route isolation on any of the remote sites then we see the site flapping (updating routes) on the 1350 and the br1.

I figured that this was because we had default weight set so I changed this to be different weights but the flapping continued. I tried this on each firmware type as I figured it may be a problem with the lower firmware but the 8.0.2 BR1 (and tried a hd2) had the same issue.

Hi James,

I still alive from Covid19. :wink: We all forced to stay at home here. I hope you are keeping well too.

Back to your reported problem. May I know Route Isolation was enabled at the Transport 1 or 2 when the problem occurs? By right, Route Isolation should be enabled at the hub site. I may need to access the routers to further investigate the problem.

The purpose here is not allowed the inter-access between both head offices. Do you think the design below help?

Routes from both Area 0 can’t be advertised to each other since Area 1 is the “OSPF Discontiguous Areas”. However, routes can be advertised between Area 0 and Area 1.

1 Like

Thanks for staying on top of things during very challenging circumstances.

I trust all is well with you and yours.

Your support and community engagement is much appreciated.

Z

2 Likes

Hi TK,

Glad to hear, we are also working from home as well.

The head office sites belong to the customer and they have a 3rd party VPN between them to allow replication of there applications. They use DNS for failover so the remote sites can connect to the applications.

We need the Transport sites to to share the routes from the head office to the remote sites but we don’t want the remote sites advertising the routes back to the other transport sites.
I would expect that enabling route isolation on the remote sites should mean that they only advertise themselves to the transport sites which seems to work but for the issue where it seems keep updating routes between the 2 transport sites. Setting different weights for each of the tunnels doesn’t stop the flapping.

My proposed design meets your requirement. The routes from Area 1 will be avertised to both Area 0 and vice versa. However, the routes from both Area 0 will not able to advertise to each other.

Below is the setting for both Transport routers

Below is the setting for all remote routers

If you prefer to check the flapping that causes by the Route Isolation, please help to open ticket. By the way, all the firmware for Transport and remote routers are 8.0.2? We do have a compatibility issue in the old firmware once Route Isolation is enabled. Please refer to this release note - https://download.peplink.com/resources/firmware_7.1.1_release_notes.pdf.

1 Like

Hi TK

Firmware wise is this the bug:
image

If so then every site is at least 6.3.4 as shown in the diagram.

The problem with the OSPF design is that you can’t add pepvpn to two different OSPF areas (all the connection in the diagram are SF tunnels), so I can’t separate them in that way. Without route isolation on the remote sites we could still have a situation where traffic could be routed through the remote sites and lead to greater cellular usage.

The main objective that we are trying to achieve is to have the remote sites only advertise themselves to the transport sites and not advertise any learned routes.

Regards
James

I see. I suggest open a ticket for us to take a closer look why SpeedFusion tunnel is flapping after Route Isolation enabled.

Thanks.

1 Like