Maintaining Optimal Forwarding for Inter-Subnet Routing


I took the Single-Attach demo of Git for EVPN (https://github.com/CumulusNetworks/cldemo-evpn) to explore inter-subnet routing. This demo is not setup for hosts in different subnets to communicate so I added SVI's and advertised them into BGP. From a layer 3 perspective routing seems optimal, but I am concerned about the potential for sub-optimal forwarding.

For example, if all four leaf switches are advertising the same /24 network into the L3 routing domain then from an L3 perspective all paths seem equal and viable. If traffic is forwarded to the "wrong" leaf then it can figure out where to send it from its L2 domain information, but it traverses the fabric a second time.

I have seen Juniper and Cisco offer some ways to overcome this in their equivalent solutions. Does Cumulus also have features to ensure optimal routing/forwarding occurs between subnets? And if so, what option is available?

Thanks!

2 replies

Userlevel 4

I took the Single-Attach demo of Git for EVPN (https://github.com/CumulusNetworks/cldemo-evpn) to explore inter-subnet routing. This demo is not setup for hosts in different subnets to communicate so I added SVI's and advertised them into BGP. From a layer 3 perspective routing seems optimal, but I am concerned about the potential for sub-optimal forwarding.

So this feature you just described (routing in and out of a VXLAN) "will" work on VX because its software based, but currently requires RIOT / VXLAN Routing support on Cumulus Linux to run on a hardware switch. The demo did not showed VXLAN routing on purpose. The only ASIC right now that can do VXLAN Routing is a Broadcom Trident 2+, and that is early access. More ASICs will be supported soon. You can use the Cumulus HCL to find a switch that supports this. There is also a way to do this with non-VXLAN Routing capable ASICs which we internally have coined "Hyperloop" but this is not recommended unless you HAVE to do it because you burn two ports for every cable and the config is pretty complicated / not easy to consume.

I am actually a bit surprised EVPN / distributed routing worked for you if you put it on every switch. With EVPN we currently only support type3 and type2 mac-exchange (not type2 ip-exchange). We also don't support suppressing the exchange of the VRR (Virtual address) so with EVPN you will see the virtual address on every single VTEP from the perspective of a VTEP and get flooding. I imagine if it worked either you were not aware of the flooding or you hit one of these scenarios:

  1. If you just put VXLAN routing on 1 VTEP (or 2 switches acting as a single VTEP) it will work.
  2. If you use LNV instead of EVPN it will work.
  3. If you did not use the same SVI (switch vlan-interface) or VRR address on different VTEPs it will work.

For example, if all four leaf switches are advertising the same /24 network into the L3 routing domain then from an L3 perspective all paths seem equal and viable. If traffic is forwarded to the "wrong" leaf then it can figure out where to send it from its L2 domain information, but it traverses the fabric a second time.


So this is what happens if you use LNV, the traffic hair-pins. You have a 1 / (Total VTEPs) chance of getting the right VTEP the first time, then it will go to the right VTEP. When fully support of EVPN Type2 IP-Exchange happens we can advertise the /32 (the individual server) instead of the /24 that will make this more optimal.

I have seen Juniper and Cisco offer some ways to overcome this in their equivalent solutions. Does Cumulus also have features to ensure optimal routing/forwarding occurs between subnets? And if so, what option is available?

We will support distributed routing (also called anycast gateways, symmetrical routing, asymmetrical routing, etc depending on the implementation and marketing jargon used) in a future release. This is a high priority for us. However the reason it is not supported now is the commodity ASICs that support this only recently have hit the market with the features consumers wanted and previously there was VXLAN (Trident 2, Tomahawk) but the industry assumed every VXLAN was a tenant, and you would want those tenants secured by a firewall or another mechanism. So the firewall could sit on two VLANs (one VXLAN/VLAN and one routable VLAN) and that is how a host/tenant would escape a VXLAN. VXLAN is starting to get used it way more use-cases now so both VXLAN routing and distributed routing is becoming more important.

Let me know if you want me to elaborate further. Work with your Cumulus SE if you have time lines coming up and we can work with you to make sure we meet goals / expectations.
Great follow-up! It is good to know this is a feature Cumulus is working on and will be supported in commodity hardware in the near future.

I am actually a bit surprised EVPN / distributed routing worked for you if you put it on every switch.

I did nothing special except, as you pointed out, created SVI's on each leaf to serve as a gateway for the servers. I then advertised these networks into BGP using the IPv4 AFI. I confirmed the servers were using the leaf they were attached to by looking at the ARP table on the servers.

However the reason it is not supported now is the commodity ASICs that support this only recently have hit the market with the features consumers wanted and previously there was VXLAN (Trident 2, Tomahawk) but the industry assumed every VXLAN was a tenant, and you would want those tenants secured by a firewall or another mechanism.

This is good background information to have. I think as the technology evolves and adoption becomes more widespread folks will find a use case to have multiple subnets within a single tenant. I could definitely see inter-tenant communication requiring to go through a firewall, but intra-tenant via inter-subnet routing could be protected by something like iptables on the host.

Thanks again for the information!

Reply