DICOM over SpeedFusion or PepVPN

AndrewSt · January 5, 2024, 12:07am

Hello, need community help.

I’m pretty sure Peter West dealt with sending DICOM images over Speedfusion tunnel. (I do have ticket open with Eng team Ticket Id:23120856)

Unfortunately I’m digging into very slow DICOM protocol transfers and was amazed that with all things equal, same DICOM image sent over OpenVPN will get 10x times faster than over SpeedFusion.

I’m looking for just an advice on SF best performing configuration:

UDP vs TCP. , Bonding vs Dynamic bonding and etc.

TIA.

Wayne_Eveland · January 5, 2024, 2:40pm

I have several units transferring DICOM images between onsite modalities and hosted PACS systems. I don’t generally run into issues. However one thing of note. DICOM/HL7 as protocols don’t do PMTU discovery. The devices themselves will always attempt to ram traffic with full frame sizes through the tunnel with the DF bit set. It’s been my experience that setting Fragment Packets to Always instead of DF Bit set takes care of that issue when using SpeedFusion Tunnels.

OpenVPN is interjecting itself into the TCP handshake inside the tunnel and rewriting the MSS in flight during handshakes. I believe this is handled via --mssfix flag. This prevents TCP stalls which kill throughput. This is why it’s much much faster.

Speedfusion tunnel uses IPSEC on the outside (ESP). IPSec has no mechanism to interact with traffic encapsulated with the protocol as it only acts like a encapsulation layer. The actual Speedfusion VPN protocol doesn’t appear to have any flags to rewrite MSS either. So the best way to handle it is to always allow.

::Ninja Edit::
I forgot to answer your other question.

Speedfusion ONLY does UDP as far as I’m aware on the outside. That allows for the best control of latency and jitter. But that’s also why upper level protocol knobs like FEC and WAN smoothing are necessary to enable correction in case of loss. UDP will not re-request lost frames like TCP.

As for protocol. Dynamic Weighted is the way I go. With regular bonding you have a fixed Packet buffer or no packet buffer. You also have traffic QOS limiting with regular PFIFO queueing.

With Dynamic Weighted queueing you get a new QOS mechanism akin to FQ_Codel which handles bufferbloat much better by properly prioritizing packets into fixed length queues to limit overall buffer length. It also adds a dynamic jitter buffer instead of a fixed one. So if you’ve got voip or other latency sensitive application the buffer can dynamically grow or shrink based on network conditions. You can also limit the length to what you can find acceptable.

For instance with VoIP and data traffic you can create two separate sub tunnels each with their own rules and priorities. You can use outbound policies to route specific traffic over each tunnel and control it better. Need a max jitter of 40ms for VoIP traffic? No problem just set that tunnel separately. higher latency and jitter can be OK for Data transfers so you may be able to let it get a higher queue to maximize throughput.

So for me I go with SpeedFusion Tunnels with Dynamic Weighted Bonding. Then set my knobs and dials depending on the actual data I gather from the current ISP’s

AndrewSt · January 5, 2024, 6:41pm

@Wayne_Eveland yes thank you for a response,
I do not have any issues as well with exception of poor performance = user experience.
And this is what I’m addressing - improving performance.

I’ll expand my answer a bit later.

AndrewSt · January 7, 2024, 5:32pm

@Wayne_Eveland

let me walk through for what I see.

C-ARM or Ultrasound machine sends DICOM over WiFi of Ethernet which is MTU 1500
Coax (sometimes failover to LTE / 5G) is 1440.

I have Balance 20X which is configured to set DICOM traffic over SF tunnel to Balance SDX LAN1 MTU 1500 > Cisco ASA WAN > LAN SWITCH > MTU1500 DICOM PACS server

Manually changing DF bit is only available in manual SF mode and by default is off.
I use IC2 to establish 150 links to avoid doing it manually and I don’t really see where I can tweak it in SF scenario.
What else I can do to improve throughput?

Side note. I discovered that reducing overall traffic through SF tunnel may help but DICOM is having issues even when my throughput reaches an about 150-200 Mbps total (sum of RX and TX) in the tunnel and I tried Unencrypt tunnel which should bump up tunnel throughput from 500 Mbps to 1Gps
I get that other traffic contributes to congestion but to describe how pity is is:

My SF tunnel throughput limited on the client side by upload capacity of ISP (lets say 20-30Mbps)
and I see some other applications such as video stream may utilize 50% - 70% of the available upload streaming 16-20Mbps through tunnel, DICOM sends mere 200-300Kbps and OpenVPN can send entire 20Mbps with no issues. Which leads to poor CX when it takes seconds to send study over OpenVPN and up to 50 min for CARM video and patients and doc have to wait.

Wayne_Eveland · January 8, 2024, 2:00pm

I also too regularly have ASA’s behind the Balance20X units. I believe the Default ASA MSS rewrite is 1350. The part where we differ is that I configure all my L2L tunnels manually. While I indeed enjoy the Software defined aspect of deployment profiles. There’s just way to many unknowns involved in dealing with various ISPs that we need a bit more granular control.

So if you want to test the MSS issue there is a few things you can do. First in your asa use the “ip tcp adjust-mss 1300”. This will force the ASA to change the MSS value in flight during the TCP three way handshake. Instead of the default 1350 setting, it will make it 1300. This is because at the Auto setting the Peplink may be using 1440 for the Internet MTU. Then you have 20bytes for the IP header, 8 for the UDP header and 80 for the speedfusion header all this equals 1332.

You can test another way by using a terminal onsite and the ping utility. You can pass the “-f” and “-l” headers to tell it to set the DF bit and you can manually decrement the ICMP length until you find where the tunnel doesn’t fragment. just below this is where you want to set the MSS on your ASA

for example: ping -f -l 1350 dicom.server.ip

During this, I would also run a PCAP of your specific traffic between the source and destination IP addresses. See if you come across any abnormalities (be sure to grab the icmp as well because you may not see issues present by just selecting the DICOM ports).

Also a secondary note. The Balance20X unit itself is only capable of 100Mbps-ish of Tunnel throughput. That’s a limitation of the CPU doing the encapsulating/decapsulating. While you may have 20-30mbps available upstream at the ISP end. You list that you’re having issues when the total tunnel is reaching 150-200mbps which is WAY faster than I’ve ever seen in practice with this device. If this is indeed the case, you may want to tweak some QoS settings on your devices. If your link is congested, maybe network stack on the modality is seeing a delayed TCP ack and then shrinking the TCP window size and never scaling up properly.

Here is a priority example that we use.

Are you setting the proper Interface and tunnel bandwidth on your devices? The end devices need this correctly set to have the correct constraints.

Wayne_Eveland · January 8, 2024, 2:03pm

Here’s an excerpt from the Peplink specs for the Balance 20X

It lists unencrypted at 100Mbps and Encrypted at 60Mbps. However in real world I’ve been able to pus hthat much higher.

AndrewSt · January 8, 2024, 3:28pm

@Wayne_Eveland thank you for the excellent and detailed response.
I’ll play with ASA.
And I want to mention this may not super relevant but you misread my 200Kbs as 200Mbps
in other words DICOM uses only less than 2.5 - 5% of available 20Mbps upload bandwidth

Wayne_Eveland · January 12, 2024, 4:20pm

@AndrewSt I was actually responding to this

Side note. I discovered that reducing overall traffic through SF tunnel may help but DICOM is having issues even when my throughput reaches an about 150-200 Mbps total (sum of RX and TX) in the tunnel and I tried Unencrypt tunnel which should bump up tunnel throughput from 500 Mbps to 1Gps

Which I assume is your Far end of the tunnel which I misread as the Balance 20X.
Have you been able to make any progress with your DICOM traffic transfer rates?

I’d be really curious to see what the tunnel status looked like when you were initiating the transfer. Like any out of order packets or lost packets.

AndrewSt · January 12, 2024, 8:10pm

@Wayne_Eveland I’m working with @Steve on the ticket for this.
Unfortunately to make DICOM throughput a bit more usable we had to dance around and spend time on offloading SF tunnel. (negative activity in it self as it affects initial promise of securing traffic in the tunnel, unbreakable connectivity and etc)

Some pains we had to go through and remove some traffic from the tunnel:

Some small offshore locations had tunnel set to send all traffic to SF VPN for many reasons such as to appear as USA traffic, send internal apps traffic over tunnel, VoIP and etc.
Security cameras which are notoriously bad for cybersecurity, we had to open up a firewall and not stream through the tunnel
Some VNC streaming for QA and coaching, monitoring and other tasks in the Call Centers.

I want to be clear that we offloaded SDX (center in the star of SF vpn) at the data center.
average usage of SF tunnel we saw ~120Mbps, with peaks at ~400Mbps so we have capacity in theory

after this we see DICOM throughput on the upload side in the EDGE within the all the same in ~20-30Mbps upload SF tunnel (direction from the clinic to Data center) capacity improve and now it up from 300Kbps to 2-3Mbps but still way short of available 20MBps.

This improvement is bittersweet victory as OpenVPN still 10 times faster and sends DICOM full speed 20Mbps, which they get used to see.

now I’m in the @Steve hands and hope he will help.

Wayne_Eveland · January 15, 2024, 4:02pm

No worries I’ve worked with @Steve plenty of times. You are in good hands for sure.

I do want to say as a side note. Check and see if the peplink is sending duplicate packets to your dicom endpoints. I’ve had an issue with certain software versions doing that. Especially when WAN smoothing was on.

I’d be really interested to see what an actual PCAP of the data in question to see what specifically is going on. I’d be leaning towards a TCP window scaling issue on the traffic within the tunnel.