Outbound Policy + VLAN bug in 8.5 and 8.4.1

This is a followup to several prior threads:
Link1
Link2

Summary: In firmware 8.5, there is a bug when using Outbound Policy + VLANs, which manifests as devices being unable to connect to each other and/or to get DNS. Reverting to 8.4.1 seems to fix the issue (however, read below, as it seems 8.4.x also has the bug):

This seems primarily to affect devices using an ethernet port set to VLAN Access, when the VLAN is also set to use Outbound Policy specific to one WAN.

New information: After giving up on firmware 8.5, I went back to firmware 8.4.1 (on my Balance One). Although this seemed to immediately fix the issues, after a few days, I’m seeing the problems again!

The pattern is interesting:

  • Firmware 8.5: the bug manifests every few hours, and lasts for about an hour, then magically clears up
  • Firmware 8.4.1: the bug manifests every few days, and lasts for several hours.

My conclusion is that firmware 8.4.1 also suffers from the same issue, but it doesn’t happen as regularly. Something in the firmware 8.5 version is triggering the bug more quickly.

What is going on?

I have done at least one packet capture showing the specific bug: an IOT device on my VLAN is trying to do DNS lookups, and being told that Port 53 is not available. See Packet capture

The symptom of the bug coming and going randomly, makes me wonder if there is a service on the Peplink router which is getting stuck (which triggers the bug), then eventually the service crashes and is restarted (which clears up the bug). What service? Is this a NAT service? DNS? Firewall? Unclear.

In any case, since I can now convincingly demonstrate the issue both in firmware 8.4.1 and 8.5, I’m going to go back to firmware 8.5 and see if I can isolate the problem further. I’ll report back when I know more.

Hi…
Did you try the 8.4.1s032?

Maybe Peplink has fixed this… at this S032.

1 Like

I have not - since I’m not using SpeedFusion at all, I wonder if it’s relevant to my issues?

Hi…

Only Peplink can answer your question… But " we " Peplink user/resellers/etc… Don’t have an answer about what they fixed or not…

But… Because what your wrote… Maybe you can try the 8.4.1s032 and see what will happen… We know that 8.5.(0/1) has some bugs… But what we don’t know… is… The fixed the lan issue at 8.4.1s032… But… They fixed something more?

@soylentgreen ,

Can you please create a ticket here for support team to check (Please ATTN to sitloongs) ? We may need to collect the logs from your device to analyze the issue.

We tried to reproduce the issue in lab, so far we don’t see the mentioned issue. Based on the issue description given, it can be intermitent issue. For such case, we may need to review the logs before we can conclude the issue.

2 Likes

I will create a ticket. If I include diagnostic logs - is it important that the log be created while the problem is occurring? Or can I download a log when it’s acting normal?

@sitloongs
Ticket created: 25020175 I uploaded the diagnostic log from right now (when the problem is not happening, firmware 8.4.1).

Update: I uploaded another diagnostics report right now, when the issue is happening.

1 Like

@sitloongs Is it related fo DNSMASQ crashing or something?
I’ve noticed something similar, but wasn’t able to pinpoint it as well as @soylentgreen has.

Another update: I think there are in fact two different issues.

  1. When you reboot into 8.5.x after running 8.4.x, there is a DNS problem. This is apparent immediately after the reboot. I’ve seen this happen consistently several times, and made packet captures showing this. Fiddling with DNS settings seems to fix this bug and it doesn’t recur.

  2. some other more confusing and sporadic issue with devices on the VLAN not being able to see each other. Mainly affecting IOT devices, using UDP in my particular case a Philips Hue (ethernet) and an Apple TV (WiFI) serving as a HomeKit hub, which loses connection. Having an Outbound policy for the VLAN seems to be related. This one is very hard to debug, as it seems to only happen every few days, lasts for an hour or so, then recovers.

Since Firmware 8.5.2 is now in beta I’ve upgraded and will report back if I see any changes.

Boy do I hope 8.5.2 solves these problems, I have many routers that I manage in suspense waiting for a stable release (for the first time since purchased).

Sadly, it does not. I’m still seeing problems in 8.5.2.

At the moment, I see two issues:

  1. Two IOT devices on a VLAN (Apple TV as a HomeKit hub on WiFi, and Philips Hue on ethernet) can’t see each other:

image
Even though another device on WiFi can ping both devices w/o problem.

  1. Another IOT device which can do both local (device-to-device) connections or remote (via the IOT’s cloud server) can not seem to find the device locally, even though both are on the same VLAN. I can ping this device from a third device.

I believe that all of these devices use Multicast UDP to find each other. Perhaps that’s failing?

Is there an easy way from the command line to simulate a multicast UDP broadcast and see if it’s working?

Thank you for documenting your experiences. Maybe Beta 2 or RC1 or Production releases will have that fix. Communication issues across VLans would clobber my networks.

Same thing has happened again, running 8.5.2 5739.

After working normally for about 10 days, I noticed my hot tub had gone offline again. I could connect to it over the VLAN, but it could not connect from the VLAN over WAN to the cloud server. Reason: DNS was failing.

Here are my DNS settings:
Network
image

VLAN-Specific:


This setting works, but fails after about 10 days:

To work around the bug, I need to hard-code the DNS server IPs:

I don’t know what’s exactly going wrong, but it’s repeatable.

My theories:

  • DNS server for the VLAN is crashing (but perhaps only for VLAN Ethernet devices?)
  • Firewall starts blocking DNS requests (but only for Ethernet VLAN devices?)
  • Gremlins?

Still a problem in 8.5.2 RC 2 ?

I’ve been running 8.5.2 RC2 for 2 days now and don’t see the bug, but that doesn’t mean much: last time the bug didn’t show up until I had about 10 days of uptime :frowning:

Running 8.5.2 release version, and the bug is back:

One IOT VLAN device (in.touch gecko dongle connected via ethernet) is not getting DNS, even though other devices on the network can get DNS, and can ping this device.

Last time this happened, I did packet captures which looked like this:

no. Time Source Destination Protocol Length Info
453 0.794275 10.0.64.104 10.0.64.1 DNS 82 Standard query 0x1234 A intouch2.geckoal.com
454 0.794441 10.0.64.1 10.0.64.104 ICMP 110 Destination unreachable (Port unreachable)

In other words, the IOT device (on VLAN with address 10.0.64.104) is sending a DNS query to the Peplink, and it appears the Peplink is blocking the request.

I opened a new ticket this morning: 25041234 The device is currently in the broken state with Remote Access enabled.

@sitloongs I hope someone from Peplink can diagnose it this time?

An update: Peplink engineers were able to remote in to my Balance one and see it in the nonfunctional state, and seemed to think the DNS proxy was in a bad state; they restarted the DNS proxy and the issue went away. Clearly, that’s only a short-term fix, but I’m cautiously optimistic they’ll be able to isolate the root cause and provide a firmware update that fixes it? TBD.

2 Likes

I keep thinking our devices have this issue, can you share your ticket number for reference?
Did this problem exist on 8.3 as well?
If the vlan devices use public dns , does the problem still happen.

1 Like

@Jonathan_Pitts the ticket # is 25041234

Changing DNS settings often seems to fix it temporarily.

I’ve seen this behavior in 8.4.x and 8.5.x - can’t say if it was in 8.3 or not.

Of note, several of us noticed that it showed up immediately upon rebooting after an upgrade from 8.4.x to 8.5. See discussion here: VLAN Not Getting External IP Access

@soylentgreen @Jonathan_Pitts ,

I will help to investigate the issue as well.

@Jonathan_Pitts

If you have device experience the issue, please help to create a ticket and ATTN to me :grinning:

1 Like