Hi Martin,
Right now, I only have a layer 2 switch hooked up to a single WAN at my colocation. The WAN NIC on the two ESXi hosts have a public IP, so all the VMs underneath with public IP route internally route through the WAN NIC without issue.
1.) If the ESXi physical NIC can have a public IP connected to the LAN port of the Balance One, the VMs underneath should be fine as well right? Just an extension of my example above with the L2 switch.
2.) You could have the same network on the Balance One and the Layer 2 switch if only one was active at a time right? Unless configured differently, ESXi defaults to one active adapter and everything is passive for the management/public network. In theory, the NIC connected to the Layer 2 switch would only activate were the Balance One were to go offline (and consequently drop the link on the connected NIC) for some reason. Hopefully never, but just planning for worst case.
3.) This particular computing cluster is small scale with $4k in equipment costs so not fancy enough to warrant that level of tuning and engineering to deal with jumbo frames. Just doing regular 1GBE LAN over prosumer hardware. If I were to try to improve storage performance, direct 10GBE connect between the storage and ESXi host is probably best bang for buck/performance/redundancy/configuration hassle.
I agree with you that a pair of pfSense VMS with automatic HA vmotion turned off would be perfect here. Would be practically free and can cover all sorts of IPSEC VPN, VRRP and firewalling gadgets. There’s only three problems with this approach:
A.) I’m not a network engineer. Sure I can tell you that a VLAN is a logical segmentation of broadcast domains on the same physical switch. Knowing the theory and actually going in there to configure switch ports, trunk ports, tagging and untagging stuff, troubleshooting on the packet level is a whole different game. =) A site to site VPN between two Peplink products might take five minutes to configure. The last pfsense ipsec VPN between a Cisco router and pfsense VM took me about two hours to get both ends happy enough to get a session. Once connected, took me another two hours to figure out why they are connected but I can’t ping anything across the tunnel. Apparently, you have to assign a gateway to the tunnel to to allow communication across the tunnel. AND by default no ports or protocols are allowed to communicate across the tunnel. It should be done this way and it’s surely the most secure this way but the five minute GUI clicker in me would be over my head for a long while. I would mind the pain less if I had unlimited time to learn all the awesome networking stuff.
B.) So uhmm, who do I call when I need help with virtual routers? I’m sure there is a ton of resources on pfSense but you have to have enough of a base to make sense of it or solve problems effectively. Not being a networking person, sometimes you don’t even know what questions to look for to solve a particular problem.
C.) Physical appliance is much easier for other people to support. I can ask someone to plug in a network cable, power cycle a switch, tell me what lights are green and orange. VMs are awesome and generally robust until a whack problem develops like this (rare but it does happen). The VM is locked up. You power it off with two mouse clicks (easy!). You then go to power it back it back on with two mouse clicks in theory. NOPE, vCenter says the VM is off and won’t let you power it on because a ESXi host has locked the virtual machine file. To release the lock, someone’s going to have to reboot the storage (and take down every other VM) or use the ESXi client to enable SSH. Then with SSH enabled, you console into the OS and try to find the VM file and what host and datastore it is - type in some long string to release the lock at the CLI. Possibly need to move all the running VMs to another host before rebooting the host with the stuck virtual router.
Lastly is Drop in mode layer two bridging? You mention you can pick layer 2 bridging or NAT but not both right? I’ll have NAT and private network taken care of with another device if the Peplink hardware can uplink the public IPs and load balance the WAN for them. While having the WAN ESXi NICS on LAN1 of the Balance One and having the iSCIS NIC on LAN2 of the Balance One technically be two networks, could the problem be solved this way?
1 - Put them on different VLANS
2 - The storage network would not route at all and have no gateway configured. It’d live on the same /24 as the storage.
Thanks again! It is getting closer, I may need to switch to 210 to get the functionality I’m desiring here.