Speedfusion crashes when there are route conflicts

Had a problem tonight that took us a while to diagnose.
This will be complex to explain, but I believe that the speedfusion engine in one of my Balance 710s was crashing every ten or twelve minutes, and that it was caused by some route conflicts.

Setup:
Two data centers.
Each has two pairs of B710 in high availability. One pair currently with 290+ speedfusion peers, the other with about 40 (I am looking at starting to use fusionhub for growth)
The B710s in the two data centers have speedfusion link between them.
Each remote site has a MAX-BR1 with a SF connection to each data center.
Each MAX-BR1 has (among other things) a small private subnet, usually a /28.
This subnet is advertised over the SF/OSPF cloud.

Now - one of my techs made a mistake and re-used 5 subnets. i.e. he used five subnets already in use on five pepwaves on five NEW pepwaves. Several of these went live yesterday.
The B710 does show these route conflicts, but I am not sure what it is doing other than display the text “route conflict”. No one noticed that, since you have to be logged into the B710 on the dashboard to see it.
We started having “weird issues” this afternoon, escalating to having all the SF routes drop and restart every ten or twelve minutes. Unfortunately our data center had DOS attacks the last two fridays, so I assumed it was that and basically yelled at them instead of figuring out that it was us for about an hour.
Then I noticed that it was only on the one B710 that the routes were dropping. Then I noticed the “route conflict” and took down those 5 devices and the problem stopped.

I have two requests:

  1. Check into this crash - a route conflict should not crash the OSPF engine.
  2. and THIS IS THE IMPORTANT ONE. I have repeatedly asked for route conflicts to be an ALERT EVENT. Humans make mistakes. ROute conflicts happen. If we had received “warning - oroute conflict subnet xx.xx.xx.xx on peers XXXX and YYYY” it would have been fixed within minutes and likely caused no harm.
    Also, is there any way that the OSPF “hub” (the B710) can tell the remote peer that there is a conflict, so when a tech is setting up the max-br1 and brings up the speedfusion connections they turn red and show “ROUTE CONFLICT YOU IDIOT!!!”. That way they would see it and fix it at once.
7 Likes

@jmpfas

Well acknowledged and your request submitted to Engineering team. Engineering team agreed the concerns and considering the possible feasibility to improve this.

I will move this to feature request group.

1 Like

Was any progress ever made on this?

We are about to implement a similar configuration and are slightly concerned what might happen if someone puts the wrong subnet on a router at the edge.

Bouncing this one back to the top again - we have now implemented this, and the route conflict still does not get reported very well. You can really only see it on the Speedfusion status page.

Any chance we can get a route conflict to be an alertable event?

Maybe a new speedfusion peer could also be ‘blocked’ from connecting if it tries to advertise a route that would conflict with an existing already established one?

@sitloongs

1 Like