VXLAN/EVPN on Cumulus VX - missing BUM fdb entries


I'm building a virtual network to test VXLAN with EVPN using 3.2.1 with the EVPN package (using the instructions here). While I've been successful in setting up the EVPN address family, and confirmed that both Type-3 and Type-2 routes are being passed between my VTEPs, I'm still not able to pass BUM traffic, only unicasted ethernet (i.e. I can ping with static ARP entries on both test hosts, but ARP isn't working).

My two VTEPs are:
AS1 - 10.0.1.1, ASN 65101
AS2 - 10.0.1.1, ASN 65102

Peering is with a pair of upstream fabric switches (AS 65001 and 65002).

For example, on my first VTEP (AS1), I see:
vxlanlab-cvx-as1# show bgp evp route type multicast 
BGP table version is 0, local router ID is 10.0.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-2 prefix: [2]:[esi]:[EthTag]:[MAClen]:[mac]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 10.0.1.1:10100
*> [3]:[0]:[32]:[10.0.1.1]
10.0.1.1 32768 i
Route Distinguisher: 10.0.1.2:10100
*> [3]:[0]:[32]:[10.0.1.2]
10.0.1.2 0 65001 65102 i
* [3]:[0]:[32]:[10.0.1.2]
10.0.1.2 0 65002 65102 i
Displayed 2 prefixes (3 paths) (of requested type)

root@vxlanlab-cvx-as1:~# bridge fdb show
52:54:00:80:a8:b8 dev swp3 master bridge permanent
52:54:00🇦🇪08:fa dev swp3 vlan 100 master bridge
fe:54:00:b9:dc:4e dev vtep10100 vlan 100 master bridge
fe:48:4c:30:a9:f9 dev vtep10100 master bridge permanent
52:54:00:4c:3b:af dev vtep10100 vlan 100 master bridge
52:54:00:4c:3b:af dev vtep10100 dst 10.0.1.2 self
fe:54:00:b9:dc:4e dev vtep10100 dst 10.0.1.2 self
root@vxlanlab-cvx-as1:~#

Per the documentation, BUM FDB entries should appear here as a destination MAC of "00:00:00:00:00:00", but these entries aren't present here. I've confirmed that I have a kernel route for 10.0.1.2:

root@vxlanlab-cvx-as1:~# net show bgp 
show bgp ipv4 unicast ===================== BGP table version is 60, local router ID is 10.0.1.1 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Origin codes: i - IGP, e - EGP, ? - incomplete *> 10.0.1.2/32 10.10.20.0 0 65001 65102 ? *= 10.10.30.0 0 65002 65102 ? Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 192.168.122.1 0.0.0.0 UG 0 0 0 eth0 10.0.0.1 10.10.20.0 255.255.255.255 UGH 0 0 0 swp1 10.0.0.2 10.10.30.0 255.255.255.255 UGH 0 0 0 swp2 10.0.1.2 10.10.20.0 255.255.255.255 UGH 0 0 0 swp1 10.0.1.3 10.10.20.0 255.255.255.255 UGH 0 0 0 swp1 root@vxlanlab-cvx-as1:~# netstat -rn Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 192.168.122.1 0.0.0.0 UG 0 0 0 eth0 10.0.0.1 10.10.20.0 255.255.255.255 UGH 0 0 0 swp1 10.0.0.2 10.10.30.0 255.255.255.255 UGH 0 0 0 swp2 10.0.1.2 10.10.20.0 255.255.255.255 UGH 0 0 0 swp1 10.0.1.3 10.10.20.0 255.255.255.255 UGH 0 0 0 swp1
What else should I be checking here? Or am I running into a limitation in Cumulus VX, possibly?

5 replies

Userlevel 4
Are you running VRF at all? I hit a bug (which we will fix for GA) with mgmt vrf, there is an easy workaround.

#ifdown -a -X eth0 -X mgmt
#vrf exec default ifup -a -X eth0 -X mgmt

If not try doing a networking restart (sudo service networking restart) and see if the bridge fdb comes back.

If that still doesn't work join our slack channel, I am on holiday today but we can setup a call or something https://slack.cumulusnetworks.com/

The networking restart fixed the issue, thanks!
If I run into this again, are there any diagnostics you'd like me to grab before the restart to help you root-cause this?
Userlevel 4
Next time it happens try to remember what the order of operations was, grab a cl_support, and ping me on slack or on here. I am hoping we won't see this on the next rev. 🙂 Should be a dual attach demo out soon as well
After watching this longer, it doesn't appear that the networking restart is a permanent solution - the entries eventually are flushed from fdb, and it takes another restart (and, in some cases a restart for quagga as well) to restore them.

I've generated a couple cl-support files; let me know where you'd like me to send them. Thanks!
Userlevel 4
Hey Chris,

Can you run a "dpkg -l quagga" on the switches. Engineering has looked at the cl_supports and it looks like the node vxlanlab-cvx-as3 is not running the EVPN enabled quagga. You should see "cl3eau8" indicating the change.

Reply