Confusion over speedfusion and traffic control algrithums

KieranC · May 14, 2020, 11:12am

Hi

We are having issues with internet reliability but that aside for now, i am confused by the relationship between speedfusion and algrithums that allow different types of traffic control and smoothing.

Do these all work hand in hand is really my question as they sound like they conflict.

Speedfusion allows us to combined our wans for faster speed but traffic control algrithums can be set for example to use the lowest latency connection, so do these two things work together if the wans are bonded by the speedfusion technology.

And the same question for smoothing, smoothing i assume utilises the different lans to reduce packet loss, but if the traffic control is set to lowest latency how do they work together, or are these things not able to work together?

I hope this makes some sense to someone.

Thank you

MartinLangmaid · May 14, 2020, 2:36pm

Speedfusion does get confusing at times.

There is a diagram somewhere that shows the relationship its a pyramid with PepVPN at the bottom as the foundation, the speedfusion hot failover, then speedfusion bonding and then on top of bonding would be traffic management approaches.

if you want a secure point to point easy to configure VPN that can cope in a dynamic multi-wan environment but only ever uses a single WAN (recreating the tunnel when a WAN link fails) you need PepVPN.
If you want network traffic to be able to move at a packet level between physical WANs you need SpeedFusion Hot failover (single active WAN at any one time)
If you want to use multiple WANs at the same time for network traffic you need bonding
if you are bonding you can then pick the traffic distribution approach:

Bonding - Aggregate multiple wan-to-wan links as one higher throughput tunnel. (default)
Lowest Latency - Measure the round-trip time of each wan-to-wan link every 2 seconds and select the best one to get lowest latency.
Weighted Round Robin - Send traffic to different wan-to-wan links by a ratio calculated from the user defined WAN upload / download bandwidth.
Overflow - Use the wan-to-wan link with highest overflow precedence for sending traffic. Overflow to link with lower precedence when the current one is congested, or approaching WAN upload / download bandwidth defined by user.

if you need to guarantee that a session’s packets get to the other end with the lowest possible latency in a multi-WAN speedfusion VPN tunnel where the data you are sending is small comapred to the total bandwidth available on each WAN link you can enable WAN smoothing which duplicates packets but consumes n * the amount of data sent.
if you are worried about packet loss but can’t afford to duplicate all traffic over multiple WANs and you don’t mind a little extra latency you can use Forward Error Correction.

Now if you are using WAN smoothing set to max where all tunnels on all WANs are used for the replicated traffic, then the speedfusion distribution algorithm will be ignored.

However if you are using WAN smoothing set at a lower level and have multiple WAN links then the duplicate packets will be distributed over the WAN links as per the rules. Same with traffic that has FEC applied, its packets will be distributed using those rules too.

So in the case where a lowest latency distribution is used - because typically if you saturate a WAN link latency rises as buffers fill, outbound packets will be sent over the lowest latency WAN first and then as its latency rises over another lower latency link. Speedfusion then is continuously chasing the lowest latency connections. .

MartinLangmaid · May 14, 2020, 2:40pm

It is also worth mentioning that you can have up to 4 additional sub tunnels on a single point to point VPN, and each tunnel can be configured with different SpeedFusion features.

You could then for example use lowest latency distribution SpeedFusion Bonding for all VPN traffic and then create a sub tunnel with WAN smoothng set to MAX just for VoIP traffic.

KieranC · May 14, 2020, 10:13pm

Hi

Thanks for the info.

My primary aims are faster speed but it must be solid and reliable, latency i think it a part of that.

We have software which seems to be cloud software that seems to be crashing out, and suspect it is packet loss / lost connections.

We have various different internet types and providers bonded together but was of the understanding that the smoothing would prevent the problems of packet loss etc to provide a reliable connection.

So i am trying to understand the combination of features that will solve the problem, i would like to bond the lines and get the around 100mb total, but it has to be reliable without losing connection to our software which causes it to crash and timeout.

For example i could have speedfusion on with 4 wans, smoothing set to medium and outbound rules to use the lowest latency connection, but it is understanding if those settings are the right settings to create a stable connection and if they work together or not. There is also the persistence thing, which i understand is useful for banking and some sites, so perhaps that could be an issue with my cloud software, but if you are using persistence is it not just load balancing across connections rather than bonding.

The 4G signal is good but EE seems better and more stable than Three when i have tested them independently.

Our fibre line is very slow at only 4MB / 1MB and we have a local business grade wifi from a tower we can visually see, so we are not far from it, and that gives 15mb / 15mb

Using a combination of connections, with some perhaps being less reliable and dropping more data, is that not the purpose of the product to allow you to use all of them and still have no issues due to the smoothing and making multiple requests for the same packets over all of the connections.

Or will having some providers which are not as fast / stable actually have a negative effect regardless of the peplink technology?

My logical understanding would be that if the packet comes back from one of the stable connections it would not matter so much if you have less stable connections in your mix.

Our data is small amounts, sending requests via API etc which are just a few KB, we are not using MB for each request for this cloud software.

My very basic testing seemed to show that when we removed 4G from the mix we had less issues, but 4G provides a faster speed etc should someone need to be watching a youtube video it will impact on our slower non 4G lines, we could create traffic rules to separate them but overall i just thought we would bond all wans together and the router would deal with any dropouts etc so we should not need to give certain services priority to a specific connection type and then high demand non critical data such as video going through other connections because to me that sounds more like load balancing than having one larger internet connection.

I have emailed you to enquire about you setting it up for us, but i thought it might be useful to share my basic understanding and assumptions about what i think i am trying and my basic understanding of the product.

Thanks

MartinLangmaid · May 15, 2020, 6:15am

This is the main issue that’s needs root cause resolution and monitoring. We need to work out where the issue lies and what the problem is. Is it latency or packetloss or both. Is that latency and packet loss on the WAN links or on the local wifi network etc.

Lets chase quality first and then bandwidth 2nd as a high quality lower bandwidth connection is much more valuable than a high bandwidth connection that does amazingly well on speedtest.net but nothing else.

That is what Wan smoothing is designed to do. Whether it works or not will depend on the quality of the links, the available bandwidth between you and the remote peer and the amount of traffic you are trying to push over the tunnel. If you have crazy high latency on all WAN links at the same time whilst trying to upload too much data then wan smoothing can’t help.

Persistence is a load balancing settings yes.

The trick here is now they are tested and what is measured. You will not be surprised to learn that mobile networks prioritise traffic to speedtest.net - and often host speedtest.net servers themselves to improve the way their service is measured by their customers. In your case, knowing how fast traffic can get to and from your mobile networks core infrastructure is of no use, you need to test how much bandwidth you have between you and the Fusionhub/remote peer as that is the maximum bandwidth you’ll have available to you to run Speedfusion bonding over.

Then there is latency too. Look at this:

Speedtest.net measures my latency at the beginning and says it is 14ms, but during the test my WANs went up to 274ms. The higher (and more variable) the latency the lower the throughput. I’ve seen 40Mbps of bandwidth on a cellular link that had over 2 seconds of latency - which is fine when you’re sending a UDP stream, but a disaster when running a TCP tunnel or waiting for a response from a cloud.

Yes, that is the idea. And so we need to investigate what the underlying problem is. Do you notice other applications suffering at the same time as you cloud app? Maybe run a ping tool to bbc.co.uk and see if that shows latency spikes and packet loss too when you have application problems. https://pinglogger.co.uk/ works well and is free.

OK, so what else is happening on the network becomes the question. what other devices are using bandwidth etc

When we use multiple WANs actively in a bonded VPN for reliability and bandwidth aggregation, any WAN link that is currently sending traffic that has an issue (like jitter or buffer bloat or packet loss) can have an affect on the bonded connection as a whole since it was handling traffic that now has to be resent or arrives late. Speedfusion will react to those WAN link changes (or really the symptoms it sees as the WAN characteristics change - latency and packet loss) after it sees them happen (by marking a link as down or by temporarily stopping transmission of data over a link). How long it waits before it starts using the WAN link again, and how best to distribute traffic over multiple WAN links that it doesn’t know the underlying state of (because this is passive not active measurements), has to be driven by algorithm choice and parameter defined metrics (ie when you see packet loss wait for 300ms before trying to use the link again, or if there is more than 200ms of latency on a link stop using it until latency drops again).

The only way to make this kind of WAN management fully automated is invasive. You have to consume large amounts of data in continuous proactive testing of each individual WAN link and then each tunnel passing over a pair of remote and local links, you have to tag each packet and measure its flow across the WAN links, how long it took, what the retransmissions were, then you have to do deep packet inspection on the user traffic and prioritise certain packets over others, and then you have to use external time sync methods between VPN peers to make sure that there are no collisions and that packet delivery and measurements taken at either end deliver accurate results that are calculated fast.

There are products out there that do this - or rather some of it, but they are 10’s of thousands of pounds to buy per endpoint and have very limited throughput limits due to the computational overhead. Not to mention that they are normally wholly unsuited to cellular and satellite links due to their expensive bandwidth overhead.

Instead Peplink uses passive measurements of WAN link characteristics and then parameter based, partly manual configurations so that a very good result can be achieved at a more sensible price point.
If you have good quality links it typically works straight out of the box. If you have variable links and specific traffic requirements then it needs a bit of manual config to tease the best out of it.

Thank you yes it does help. We will be applying our ‘SpeedFusion Doctor™’ service to your setup to help you get to the bottom of this. I look forward to working with you on it.