Redundant vxlan without mlag

Userlevel 1
Short Version
I have seen config examples where 2 leaf switches are put into a mlag and then the vxlan vxrd effectively runs redundantly between them. Can this redundancy be achieved without having the leaf switches in mlag, as I would rather keep the L2 part of the network as simple as possible?

Long Version
We will effectively have 2 racks (for now) with 2 leaf switches in each, L3 up to spines. Each rack will have a number of VMware ESX hosts in them connected to each leaf switch via standard port group balancing.

We also have a L2 external network/subnet for internet access, which we need to present to all ESX servers for virtual firewalls. My old school networking approach would have been to split the external range into smaller subnets and with extra vrrp's and VRF's route them across the L3 fabric. However with vxlan I'm seeing an opurtunity to do something more clever and allow more granular assinging of VM's to a rack.

I don't really have any need for mlag and would probably prefer to avoid it as "clever" L2 stuff tends to scare me. I would use just a simple L2 bonded link between leaf switches for the internal rack L2 vlan traffic. Is there anyway to run vxlan in this configuration so that if one leaf switch is unavailable, that vxlan will still be working and without causing loops?


7 replies

Userlevel 4
If you are using a L2 bond between the Sever and Switch, you need MLAG for the switches to sync state and act as one logical switch.

Otherwise I would recommend pushing encap/decap down to the host and doing RoH (Routing on the Host). That way no MLAG is needed.
Userlevel 1
Hi Sean, I was just planning on using the standard balancing in ESX where it pins each VM to each Nic, there's no sort of bonding configured.
Userlevel 4
So if the VM is always pinned to one switch or the other switch, just make the two leaf switches separate VTEPs (so each VM is essentially single attached, and only fails over to one switch) vs active/active. Make sense?
Userlevel 1
Sort of. I was looking at some of the reference diagrams and how having a L2 link between 2 leafs could cause loops and the solution was to use STP to block. However in my case I need that L2 link to be active to allow traffic to flow between switches in that L2 domain.
Userlevel 4
The only way you can do this... which is not recommended but I have seen people do this. You need to turn on lacp-bypass ( so the bond remains up... does this make sense? I would highly recommend doing LACP instead of trying to 'trick' it.
Userlevel 1
I think the problem is that I think STP might be needed to stop loops, but I'm not 100% sure how vxlan works with regards to switching loops.

Below is a little diagram to show what I mean. The 2 leaf switches in each rack have a L2 link between them to form a traditional L2 network, allowing vm's to communicate no matter what switch then end up on. Then there are L3 links up to the spines. Each pair of switches also run vrrp to offer a gateway address to the L3 network. As mentioned the ESXi servers don't form any sort of bond.

The red line shows how I would want to use vxlan to be able to extend this L2 network/vlan into another rack. So that if a VM from a vlan in rack1 moved into rack2, it could still communicate with its peers in rack1.

If I ran a vtep on both leaf switches, would a loop form between the vtep's on each leaf and the L2 inter switch links?

Userlevel 4
If you have a bond between ESXi host to the Leafs (and it has to be a bond) this needs to be LACP so MLAG can exist between the Leaf switches. Otherwise if you are just mac-pinning each Leaf switch can be a separate VTEP. There will be no loops in either way, STP is not propagated across a VXLAN tunnel.