We had a regional issue with Verizon FIOS Internet Access last night that took down about 25 of our customer that were running all SIP traffic at the same time. Luckily about 20 of them we had at least 1 more ISP available that we could manually fail over to.
So FIOS was “apparently” upgrading their fiber backbone and decided at 5 pm to do this upgrade and not announce it. All the customers that had FIOS and lets say Cablevision are set up with an Outbound Policy for priority, FIOS being primary and “X” being backup. It is set up to terminate session from “X” when FIOS had established itself again.
Here was the HUGE problem, FIOS never technically “Disconnected” and I was getting to notifications on the Utility App that said this either. At first I thought it was an issue with our servers and it ended up not being that. After trying to ping and trace-route from inside the Peplink we weren’t able to reach our feature servers through Verizon Circuit. As soon as I manually disabled the FIOS and all traffic failed over to “X” the phones immediately registered and it was all roughly 20 of these customers that were effected in this particular area.
The thing is most of these customers were still able to surf the web via the FIOS connection. I will never get a true answer from FIOS but I think during their work that they “Blocked” certain protocols such as maybe SIP or some RTP traffic which in tern took down the phones.
- Set up the same priority algorithm but detail it for protocols, meaning I could set up that same priority algorithm but pick protocols tat i want to be detected after “X” amount of time of being “down” or “undetected” so that the failover will happen based on both “Disconnect” and “Protocol Disconnect”
Destination: "customer feature server"
Priority Order: FIOS - SIP, RTP, MGCP, H.323: If not detected for 15 Seconds - Fails to CV or TWC/ Upon recovery for atleast 120 seconds, revert back to FIOS.
Terminate Sessions On Link Recovery: CHECK
Something along these lines.