SIP traffic suddenly stops responding?

I’ve been seeing this issue now over multiple routers (core, balance 20, etc).

When we use it for SIP traffic, every couple of weeks the SIP traffic will all of a sudden stop and the phones report a 408 timeout to the server. The server is pingable and all other services are fine, just no sip traffic can pass. Rebooting the router clears it for another couple of weeks.

Anyone seen anything like this?

Have SIP running over a B580, B210, B1 in production and not seen that issue.

What can you see in the network captures? Are you using a FQDN for the SIP server or an IP address in the phone configs? Maybe its DNS related?

1 Like

Have you tried disabling SIP ALG in the support.cgi page?

Many hosted SIP providers look at this as their first port of call when routers are acting as an application layer gateway, many session border controllers simply don’t like it.

One provider I work with insist on ALG being disabled on any device we connect to them with.

1 Like

Ohhh. SIP ALG would definitely do it. I didn’t realise it was enabled by default. I’ll try this.

We are using IP address so wouldn’t be DNS related. Happens to multiple sites with different FW too

CJ

Tried SIP ALG disabled. 5 days later the same thing happened. All our VoIP phones all of a sudden showed “SIP 408 not reachable”. I could still ping the server, access the web gui, etc. Just no SIP traffic was allowed to pass. Rebooting the router brings everything back up right away for 5 days or so. Running firmware 7.1.0. Anything else I can try?

What is your internet source? Rebooting the router would also cause your device to re-initialize with its facing modem if there is one. Possibly some other issue on your LAN, that rebooting causes the ARP table to re-initialize. I use SIP with numerous Peplink routers. Its just not an issue.

@roadrnnr1

Do you able to perform packet capture from your balance device when you found the problem ? Packet capture should able to tell the story.

1 Like

what is your VoIP phones make and model?
I’ve seen this with AAstra i53-55-57
It’s something how AAstra interprets wrong with timeouts and stop try to reconnect.
In my case I saw failing over WANs or cycle through it by manually disconnect helped in 90% of the cases.
This caused us to switch to polycoms and phase out AAstras.

They are Aastra phones actually. That’s really strange that it happens to multiple peplink routers and only every now and then. I’ve never seen this with any other router and we’ve deployed thousands of Aastra’s. It must be something in aastra => peplink?

What method is easiest for a packet capture? Is that able to be run right on the router? When I run one on the voip server I don’t see any packets destined for it. Strangely enough HTTP and FTP work to the same server during the time the phone is down.

@roadrnnr1

You can do a packet capture from the support.cgi page on the Balance 20. You’ll need to change index.cgi to support.cgi in the address bar. It should look something like this <IP_ADDRESS>/cgi-bin/MANGA/induex.cgi to <IP_ADDRESS>/cgi-bin/MANGA/support.cgi

The packet capture is limited to 20 Mb on the device. If you check the Remote Capture option the limit becomes the size of the hard drive you’re sending the capture to.

1 Like

then you got hurt by AASTRA issue, We spend a lot of time on this issue.
Our cure is:

Force Aastra to failover between WANs and reboot Phones
Swap Aastra with Polycoms

All other solutions was a waste of time (crossed out) learning

Aastra just stops retrying to re register on SIP registration. You need to force it via methods above or similar.

I don’t think this is purely an Aastra issue. We’re a hosted PBX provider with thousands of Aastras deployed, this only showes up when we are using Peplink routers.

I have a router now that’s exhibiting this exact behaviour. We have 13 phones behind a peplink router. 3 of them are showing 408 not found. Wireshark shows that the SIP register packets are getting to the server, and the server is responding to the external IP of the peplink, but the peplink doesn’t pass the packets back to the internal IP. It’s almost like a NAT issue of some sort. If we reboot the Peplink, the problem 3 phones will register just fine, but then another 2 or 3 phones will randomly drop offline with 408 and the same issue happens.

try force SIP devices to failover between WANs, also try manually disconnect and reconnect WANs 1 at the time.

Hm, we ran into SIP issues not sure if this related or will help.
In Balance 710/MF750 sometimes SIP traffic just stop passing and we saw software SIP clients fail to register, or registered SIP desk phones wont ring.
Few things seemed to contribute to this issue:

  1. Firmware 7.0.1-2 Jumbo frame check box on the LAN was hidden but jumbo frames was enabled by default and looks like there was frame size mismatch to cause SIP deskphones do not ring
  2. MediaFast disable seemed to help for no explanation why on MF750
  3. Across 7.0.1-7.1.1 we see sometimes Bria Software SIP client fails to register and reboot the GW fixes this issue.

@roadrnnr1 and @astryukov

Please open a support ticket to allow support team to check on the issue.

https://contact.peplink.com/secure/create-support-ticket.html

1 Like

I had a similar issue before with Balance One devices and Polycom 410 devices. I’ll have to search my memory for how it was resolved…

I know this is a year old thread but I have this exact same issue. It only happens about once every other month. I have never experienced this issue with any other type of router.

Was a solution found for this?

@Ben_Uecker

For SIP connection issue , please open support ticket for support team to check. SIP connection is too complex that need to further verify which part that cause the issue and the necessary fine tune for the SIP applications especially when the application run in multiple WAN connections.

1 Like

Ben_Uecker
we had ticket open for over a year but problem is even with special debugging firmware version
support unable to see root cause of this issue.
It happens every 3-6 weeks.
Reboot fixes this for sure, but feeling when our call center goes down is not good.
P.S. we migrated our Cloud PBX solution to a different provider over this summer and not yet seen another incident yet.

2 Likes