DNS Proxy bug in firmware 8.5.1

This is a followup to Outbound Policy + Firewall + VLAN bug in firmware 8.5.
Evidence is now suggesting this is some sort of DNS Proxy bug, so I decided to start a new thread here.

Problem Overview
Some combination of factors is causing DNS lookups to fail:
Hardware

  • Balance One running firmware 8.5 or 8.5.1
  • an IOT device connected via ethernet (Gecko In.touch) to a VLAN access port

VLAN

  • Assign DNS server automatically : ON
  • Inter VLAN routing: enabled
  • IOT device is connected to a port on the Balance One set to Access the VLAN
  • DHCP enabled
  • The IOT device has a static DHCP reservation

Firewall Rules

  • firewall rule that blocks all packets from VLAN to Untagged LAN

DNS Proxy Settings

  • Enable: ON
  • DNS Caching: ON

Failure Mode

  • DHCP: upon boot up, the IOT device is given a static IP
  • DNS is set to be the IP address of the Balance One
  • The IOT device is unable to do a DNS lookup. Packet capture shows the peplink is replying with an ICMP error Destination unreachable (Port unreachable)

Regression Testing

  • rebooting in Firmware 8.4: fixes the issue.
  • Turning off DNS/Assign DNS server automatically and hard-coding my ISP’s DNS servers: fixes the issue

More regression testing to follow

Will the Peplink reply to DNS queries from other devices during this behavior?

When I get into really subtle bugs like this, packet captures with the -e argument are also useful to confirm that the peplink router is sending the ICMP, rather than another device that is misconfigured. I have even seen such subtle bugs that the pcap from the router did not expose the suspect traffic, but that a switch span/mirror port was required to see the traffic.

1 Like

During this behavior, I believe the problem was limited to my two IOT devices, both connected to Ethernet ports set to VLAN Access, on the Balance one. No other devices were having DNS failures.

Also, the packet capture (see here ) shows that the DNS queries, and the ICMP failures were being sent to and coming back from the IP address of the Balance One, so I don’t think another device could be to blame.

Drat!

I’ve set up a nice test system, where I can power-cycle the IOT device remotely and now the bug is not manifesting, even though I’m pretty sure I have an identical condition to last night.

Since it’s Monday AM, people are using the network now, so I can’t do much more testing at the moment.

I’m at a loss for what caused the issue (and what fixed it)? Maybe when you reboot from 8.4 into 8.5.x the DNS server or firewall can get in a confused state, where it starts denying DNS queries on the VLAN? Maybe when I changed DNS settings (and then back again) that somehow cleaned up the bad status? Confusing.

yes, but that packet capture doesn’t show the ethernet mac address. These things can be ultra subtle. If it does lock up again, don’t change anything, put in a network switch with span and look at the entire packet stream. Misplaced mac addresses from odd ARP tables can cause things to behave in interesting ways.

Interesting theory - if I understand, you are suggesting that the IOT device’s DNS queries may have been routed to some other device, and that device was replying “Hey, I don’t allow port 53”? I suppose it’s possible, but that would still sound like a peplink bug to me, as the only thing that triggered this issue in the first place was updating from firmware 8.4.1 to 8.5.0.

I have a documented bug of a peplink B20X allowing DHCP requests to go from a WIFI VLAN to the WAN port, and these would not show up via local packet captures, only a full switch mirror/span on the wan connection found them.

Do not rule anything out when tracking down a bug… by definition a bug has no limits and can do what it wants. It doesn’t matter whose bug it is, but that you can gather enough evidence to pin it down, and they can be A+B+C… change any one of the circumstances and it doesn’t get triggered… but is the problem at A or B or C?.. pcaps don’t lie, until they do.

yes one theory is that someone else is sending those ICMP messages… if the peplink is responding on port 53 for other IPs and not the IOT Vlan, that is something to know.

I had the B20X drop off the starlink network… a regular pcap showed all of the traffic… the pcap with -e (MAC addresses) showed that when the B20X dropped off, the starlink next hop router was sending to a different MAC address, and therefore the B20X was ignoring the return packets (MAC not for me) you could not find this issue without the MAC addresses.

2 Likes

what DNS server are you using

The bug happened when I had the DNS Proxy setting enabled:

image

I have a packet capture showing the cause of the bug - the IOT device was asking the peplink router for a DNS request (making a normal request over port 53) and the Peplink was saying that Port 53 was not reachable. Link to Packet Capture

After some series of steps (which included me paying with DNS settings, rebooting from 8.5.1 back to 8.4.1 and then back to 8.5.1) the problem went away.

My suspicion at this point is that there is some bug in firmware 8.5 and 8.5.1, possibly triggered only after upgrading from 8.4.x, where the Peplink gets confused and is denying DNS requests on a VLAN for some reason?