Excessive data usage MAX BR1 mini - Confirmed

I love the MAX BR1 mini hardware. It is the product I was looking for just could not wait to give it a try. After a quick setup I noticed the device was using 1.4 MB of data each day with nothing connected and InControl disabled. No health checks etc. The MAX BR1 mini should not be using any data while connected to the cellular network but it was. I opened a ticket and was asked check everything and confirm what I had already configured on the device. I did send the diagnostic report and network captures as requested. They even had me enable the InControl so they Peplink support could check the device. I changed SIMs just to see if that made a difference. I observed the same behavior with the second SIM. I opened a ticket with my carrier Hologram.io and offered the pcap files for them to review. The next day I get a very detailed analysis of what they found in the file. The MAX BR1 mini was the source and I am now waiting for Peplink to come to the same conclusion and render a fix. Peplink should be embarrassed by the lack of knowledge of their product creating a very ineffective ticket screening process. For those using the BR1 mini that have experience the excessive data usage there is hope that a solution is on the way. I hope Peplink will improve the product support by not assuming the customer is an idiot and does not know how to configure the device. I think the hardware is not the issue and I hope this is resolved quickly. If resolved, I will still purchase the BR1 mini for my solution. I got this far because I did not give up and just documenting what I found to the point anyone would draw the same conclusion that the BR1 is the source of the data. The carrier confirmed this as well. I hope some good will come from this. Look for the response from Peplink.

Update:
The MAX BR1 mini and my guess all Peplink products run what they call smart services on the device that cannot be disabled so there will be some data usage generated by the device. The minimum monthly is 5MB assuming it is running all the time. The initialization of the service has the big hit depending on how it is used you could see 20 to 30 MB for this. For many this is not an issue but for IoT and low MB plans it is something to consider.

Did the packet captures show what destination IP(s) were the traffic culprit?

Are you using their Flexible Data plan? Looks like the impact of 1.4 MB/day could be approximately $14/mo per device. I could see how that might add up with hundreds or thousands BR1 Mini’s embedded within some IOT-devices. Are you able to further quantify the impact of this in that way? That might help understand the urgency… Because just at face-value 1.4 MB/day hardly seems excessive.

Hi Philip - welcome to the forum. That’s quite a first post…

as Erik mentions, 1.4MBps a day isn’t as huge amount for a typically high bandwidth device like a BR1 Mini. You must be paying per mb for your data.

Kudos to Hologram for the assist. I’d be interested to learn what traffic the BR1 is generating - what did they say?

I’ve been working with Peplink for a while now and so I very rarely need to submit support tickets to Peplink engineering. When I do, half are me being an idiot and a missing a very simple configuration step (because I’m in a rush, or because the customer I’m supporting has configured something in a totally unexpected way that I just wouldn’t have thought about it). Engineering treat us all like idiots when we log tickets because often we are. Don’t take offence.

Working with them to get to the bottom of the issue is what makes the product better but also shows them that you’re not an idiot for the next time you need support.

I’m confident that now armed with the information they need, Peplink engineering will be taking a very close interest in your issue. Best of luck and do keep us updated here!

2 Likes

I am using the BRI as primary Internet access for my system that generates very little data. I do use the Flexible Data plan but my profit comes from efficient use of the data services. So when I saw the usage after the first day I knew I was in trouble. I use less than 5 MB per month, $2.45 is just what I need to make this work. Today’s coders are sloppy and fix their issues with more RAM, faster CPUs and broadband. More is not always better.

I am impressed that Hologram.io actually figured out the traffic and identified at least one issue. They cannot address why the device is even attempting the connection to 8.8.8.8 but the TTl is too low for it to work. I do have a ticket open and I sent the same file to Peplink support. Why did it take Hologram to point this out? Where are the Peplink engineers??? The hours I had spent convincing someone there was an issue was frustrating. The last update I received on the open ticket was the issues was identified and a solution is being worked on. No details of what was discovered.

Below is the copy of the response from Hologram.io.

After review it looks like you were able to pretty much parse out what was happening. One of the specific take aways of the data in the pcap file we have for you is that this is all device initiated traffic (data usage) we are seeing. That is your device was the entity initiating a TCP connection every 10s. Because of a problem with the packet, the network was notifying your device of the failed TCP request.

Here is the summary of what we are seeing:
Roughly every 10 seconds, your device (100.66.226.89) was trying to make an HTTPS (port 443) connection to 8.8.8.8. For reference, 8.8.8.8 is a Google Public DNS address.
After about 400 ms, the gateway device (195.226.133.50) returned a ICMP - Timeout exceeded Message to your device.

This ICMP Timeout Message message is only generated when the time to live (TTL) field in the IP header has reached a value of zero (0). So, in this case, the router at 195.226.133.50 is telling your device that the packet was dropped due to TTL specific problems. This is expected network behavior, given a dropped packet due to TTL being 0.

In review of the data in the pcap file, it looks like the TCP packet sent by your device to connect to 8.8.8.8 is sent with a TTL value of 2. This is too small to allow the TCP packet to reach its destination. We would normally expect something like 32 (or larger) for this field. So, from this, it looks like the Peplink device was specifically configured set the TTL to a value to 2. Can you try investigating your device to see if the TTL is set to a low value? And if so, configure it to 32 or higher, or an “auto” setting?

That does sound like a device/firmware issue… Out of the box the TTL is set to auto on the LTE interface.

Have you tried setting the TTL to 32 like Holigram suggested?

I know how frustrating it is once your certain the issue is with the device/firmware. I have had two tickets over the past 4 months which required a special firmware with an included fix. It took as little as 1 week and as much as 2 weeks to get that firmware update. In the second example there were two iterations of the special firmware. Also it did require some level of proving that the issue was legitimately with the device and not some environmental/transient thing. Having reproducability is key which it sounds like you can repro easily. I know its no consolation, just wanted to share some experience of time-frames.

Based on your description it sounds like theyre actually developing the fix/special firmware. As technologists WE want to know exactly what the bug was and how it was fixed…(while we wait). Maybe if they did that it would have given you some level of comfort that its actually being addressed. In absence of an explanation of what was ‘discovered’ I understand your skepticism.

I have hope. If did not like what I saw in the BR1 I would have returned it and moved on. The fact that I did invest the time shows I do want this to work. I have not tried setting the TTL. Just did not have a chance to do it. I think that this is just one part of the issue. What process is running that caused the initial TCP connection to the Google 8.8.8.8. That was not one of the DNS addresses from the carrier. The device is doing that for some reason. The bad TTL setting could be the reason for the system attempting the connection every 10 seconds. It might only need one good transmission and the process would stop but I kind of think there is a process that should not be running. If you turn on the health check you will see two of each. Makes me think a test health check process was hard coded in the active state. I am the very loud squeaky wheel and hoping I get some attention helping me and anyone else with the BR1.

1 Like

I just pulled a PCAP… and I see similar traffic.

does it look like this:?

IP (tos 0x0, ttl 2, id 51995, offset 0, flags [none], proto TCP (6), length 44)
10.105.11.223.3111 > 8.8.8.8.443: Flags [S], cksum 0x6789 (correct), seq 0, win 0, options [mss 536], length 0
IP (tos 0x70, ttl 62, id 34886, offset 0, flags [none], proto TCP (6), length 44)
8.8.8.8.443 > 10.105.11.223.3111: Flags [S.], cksum 0x13cf (correct), seq 2920916367, ack 1, win 65535, options [mss 536], length 0
IP (tos 0x70, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
10.105.11.223.3111 > 8.8.8.8.443: Flags [R], cksum 0x7ba6 (correct), seq 1, win 0, length 0

What is interesting is that we are close enough to 8.8.8.8 to get a reply with a TTL of 2… I think this is AT&t doing background magic. (for the reply, not the traffic) They also intercept TCP 80 and TCP 443… so it could be a side effect of that.

It is definately originating from the Peplink device… I have a LAN pcap at the same time and it shows no 443 traffic.

I have disabled in-controll management and will see if this is from that system running its own health check.

It is similar but I get the TTL timeout. Everything is shut down (health check and InControl) but yet every 10 seconds it tries to connect with 8.8.8.8 and in my case I get the extra TTL failure message. Since I am working on a very limited plan, 1.4 MB a day is a deal breaker.

Understood… My PCAP was from a B20X on AT&T so it isn’t hardware specific… but a firmware issue… somebody left a testing health check in the code tree… TCP 443 rather than ping or DNS.

I have an unlimited plan on my Sim and run 1GB/day with health checks and SpeedFusion traffic. so would never have noticed it…

with such a short TTL is feels like an auxiliary “is the Cellular ISP alive at all sort of check”. Any packet back would tell you something is on the other end of the cell modem.

1 Like

I did change the TTL to 32 and it fixed the TTL error but more data was sent because the BR1 was now able to communicate with Google public DNS server. The TTL was one issue but something is running on the BR1. This is why you need Wireshark. The little black box will send anything its creator wants. You trust it clean and has no back doors but??? No root access to the device. How does this pass any QA?

Thinking about it, with the low TTL, I’m pretty sure that these packets are for the latency checking algorithm.

To disable it change the default outbound policy to anything other than lowest latency and in Network > WAN | WAN Quality Monitoring change that from auto to manual and then don’t tick any of the WANs.

1 Like

I’ve turned off that automatic monitoring… and will recheck the PCAP.

Yes, that stopped the traffic, and I didn’t have any Outbound Policies with the latency check.

1 Like

Can you believe Peplink tech support asked me to try the Hologram suggestion? I am wondering if Peplink tech support has even verified the issue. Why would you ask me? Is it because they don’t have access to the product? Please feel free to correct me but I am under the impression the MAX BR1 mini was designed and manufactured by Peplink. This is exactly what I described in the intro. When tech support asks the stupid questions you lose confidence in the product. It would be different if they said they tried noticed the error stopped and wanted to see if I get the same results to validate their test unit. That is OK with me. I guess tech does not read the community messages as well.

@Phil_Rush did you disable all the latency monitoring stuff yet? Did that stop the traffic you were seeing?

1 Like

That was the last suggestion from tech support and it did not change anything since I changed TTL from auto to 32. That actually increased data usage because the connections do not fail any more. Now the unit uses about 2.8MB each day.

I changed the Outbound Policies to priority as suggested by tech and then did the PCAP. It appeared to work in that I did not get any cellular packets but the Ethernet details still show data and that data matches what my carrier data usage. I changed it back and still no PCAP on the cellular connection. I think I broke the PCAP so I only see the LAN. I believe there is something running on the BR1 outside of what we can control that uses the cellular data services. It is not trivial if you are using a low Iot data plan.

No. It did not. I think there is some process that is running that is outside of our control. One PCAP session I found the unit was trying to connect to an address owned by AWS. I hope it is part of the IC2 but for all I know it is some spyware on the BR1.

Did you also disable “Network > WAN | WAN Quality Monitoring” as Martin suggested?

That is what stopped the traffic for me.

I then turned it back on since it disables the WAN quality graphs, as one would expect.

I assume that you turned back off in-controll as that will always generate some traffic.
If there is nothing in the wan PCAP then I can’t see what is generating the traffic. You can falsify that it is “broken” by sending specific test traffic… ping 7.7.7.7. If that is in the PCAP then it is capturing

Have you tried running an SNMP counter against the interface?
IF-MIB::ifDescr.3 = STRING: Cellular
IF-MIB::ifName.3 = STRING: Cellular
IF-MIB::ifInOctets.3 = Counter32: 4285532132
IF-MIB::ifOutOctets.3 = Counter32: 2140867608

If I get my Max Transit back on site I will see if I can make the interface go quiescent.

1 Like

InControl is disabled. I did disable Network > WAN | WAN Quality Monitoring. Since none of us have access to the OS we are relying on code to run the packet capture. It was working for me but I think I manage to break it by trying different settings. I have not reset my mini yet but that is next. As for the SNMP counter that is new to me. I have not done anything with that but I could give it a try. I imagine that is the same count found in the Ethernet details under …/support.cgi