Question

MLAG with VX in Vsphere

  • 15 January 2019
  • 6 replies
  • 345 views

Hi - I have read some previous posts regarding some issues with VXs running MLAG.
I have one running in Vsphere and I can't get it working for whatever reason so I wanted to see if someone out here can give me some clue. The following output shows that the peer is not alive on both ends and the peer is not pingable from either end. If I strip the port and assign a pure /30 IP then I can ping each other. Further down, I have posted a snippet my interface config.

Thanks!

==net show clag on sw1==
cumulus@wceg01:mgmt-vrf:~$ net sh clag
The peer is not alive
Our Priority, ID, and Role: 1000 00:50:56:98:c3:ce primary
Peer Interface and IP: peerlink.4094 169.254.10.2
VxLAN Anycast IP: 10.3.199.10
Backup IP: 10.3.199.2 (active)
System MAC: 44:38:39:ff:00:99

CLAG Interfaces
Our Interface Peer Interface CLAG Id Conflicts Proto-Down Reason
---------------- ---------------- ------- -------------------- -----------------
L3REPNET - - - -
L3CLIENT - - - -
L3WCDC - - - -
L3ENTERPRISE - - - -
L3INTERNET - - - -
cumulus@wceg01:mgmt-vrf:~$

==net show clag on sw2==
cumulus@wceg02:mgmt-vrf:~$ net sh clag
The peer is not alive
Our Priority, ID, and Role: 2000 00:50:56:98:72:02 secondary
Peer Interface and IP: peerlink.4094 169.254.10.1
VxLAN Anycast IP: 10.3.199.10
Backup IP: 10.3.199.1 (active)
System MAC: 44:38:39:ff:00:99

CLAG Interfaces
Our Interface Peer Interface CLAG Id Conflicts Proto-Down Reason
---------------- ---------------- ------- -------------------- -----------------
L3REPNET - - - isl-down,vxlan-single
L3CLIENT - - - isl-down,vxlan-single
L3WCDC - - - isl-down,vxlan-single
L3ENTERPRISE - - - isl-down,vxlan-single
L3INTERNET - - - isl-down,vxlan-single
cumulus@wceg02:mgmt-vrf:~$

==sw1==
auto WCDC
iface WCDC
vrf-table auto

auto bridge
iface bridge
bridge-ports L3CLIENT L3ENTERPRISE L3INTERNET L3REPNET L3WCDC
bridge-vids 4001-4005
bridge-vlan-aware yes

auto mgmt
iface mgmt
address 127.0.0.1/8
vrf-table auto

auto peerlink
iface peerlink
bond-mode 802.3ad
bond-slaves swp9

auto peerlink.4094
iface peerlink.4094
address 169.254.10.1/30
clagd-backup-ip 10.3.199.2
clagd-enable yes
clagd-peer-ip 169.254.10.2
clagd-priority 1000
clagd-sys-mac 44:38:39:FF:00:99



==sw2==
auto WCDC
iface WCDC
vrf-table auto

auto bridge
iface bridge
bridge-ports L3CLIENT L3ENTERPRISE L3INTERNET L3REPNET L3WCDC
bridge-vids 4001-4005
bridge-vlan-aware yes

auto mgmt
iface mgmt
address 127.0.0.1/8
vrf-table auto

auto peerlink
iface peerlink
bond-mode 802.3ad
bond-slaves swp9

auto peerlink.4094
iface peerlink.4094
address 169.254.10.2/30
clagd-backup-ip 10.3.199.1
clagd-enable yes
clagd-peer-ip 169.254.10.1
clagd-priority 2000
clagd-sys-mac 44:38:39:FF:00:99

6 replies

Userlevel 5
What does 'net show interfaces' say on each node?
How about 'sudo cat /proc/net/bonding/peerlink' ?

I can see that the peerlink is not a member of the bridge which is another error, but the bond itself should still come up. You can see CLAG mentions on "wceg02" the "isl-down" ISL here meaning Inter Switch Link.
Ah - right.
I have been going through iterations of configuring it and stripping it off so I believe I missed that bridge config on this iteration. Although, with the peerlink a member of the bridge the peer does not come alive.

Here are the outputs you requested:

==sw1==
cumulus@wceg01:mgmt-vrf:~$ net sh int
State Name Spd MTU Mode LLDP Summary
----- ------------- --- ----- ------------- ------------------- ------------------------
UP lo N/A 65536 Loopback IP: 127.0.0.1/8
lo IP: 10.3.199.1/32
lo IP: 10.3.199.10/32
lo IP: ::1/128
UP eth0 1G 1500 Mgmt wceg02 (eth0) Master: mgmt(UP)
eth0 IP: 10.25.75.92/24(DHCP)
UP swp1 1G 1500 NotConfigured wcss01 (swp5)
UP swp2 1G 1500 NotConfigured wcss02 (swp5)
UP swp3 1G 1500 NotConfigured wcss07 (swp5)
UP swp4 1G 1500 NotConfigured wcss08 (swp5)
UP swp5 1G 1500 NotConfigured client01 (swp1)
UP swp6 1G 1500 NotConfigured internet01 (swp1)
UP swp7 1G 1500 NotConfigured repnet01 (swp1)
UP swp8 1G 1500 NotConfigured enterprise01 (swp1)
UP swp9 1G 1500 BondMember wceg02 (swp9) Master: peerlink(UP)
UP CLIENT N/A 65536 NotConfigured
UP ENTERPRISE N/A 65536 NotConfigured
UP INTERNET N/A 65536 NotConfigured
UP L3CLIENT N/A 1500 Access/L2 Master: bridge(UP)
UP L3ENTERPRISE N/A 1500 Access/L2 Master: bridge(UP)
UP L3INTERNET N/A 1500 Access/L2 Master: bridge(UP)
UP L3REPNET N/A 1500 Access/L2 Master: bridge(UP)
UP L3WCDC N/A 1500 Access/L2 Master: bridge(UP)
UP REPNET N/A 65536 NotConfigured
UP WCDC N/A 65536 NotConfigured
UP bridge N/A 1500 Bridge/L2
UP mgmt N/A 65536 Interface/L3 IP: 127.0.0.1/8
UP peerlink 1G 1500 802.3ad Master: bridge(UP)
peerlink Bond Members: swp9(UP)
UP peerlink.4094 1G 1500 SubInt/L3 IP: 169.254.10.1/30
UP vlan4001 N/A 1500 Interface/L3 Master: WCDC(UP)
vlan4001 IP: 10.40.1.1/24
UP vlan4002 N/A 1500 Interface/L3 Master: CLIENT(UP)
vlan4002 IP: 10.40.2.1/24
UP vlan4003 N/A 1500 Interface/L3 Master: INTERNET(UP)
vlan4003 IP: 10.40.3.1/24
UP vlan4004 N/A 1500 Interface/L3 Master: REPNET(UP)
vlan4004 IP: 10.40.4.1/24
UP vlan4005 N/A 1500 Interface/L3 Master: ENTERPRISE(UP)
vlan4005 IP: 10.40.5.1/24

cumulus@wceg01:mgmt-vrf:~$ sudo cat /proc/net/bonding/peerlink
sudo: unable to resolve host wceg01
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 1
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 00:50:56:98:c3:ce
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 1
Actor Key: 9
Partner Key: 9
Partner Mac Address: 00:50:56:98:72:02

Slave Interface: swp9
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 00:50:56:98:c3:ce
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 00:50:56:98:c3:ce
port key: 9
port priority: 255
port number: 1
port state: 63
details partner lacp pdu:
system priority: 65535
system mac address: 00:50:56:98:72:02
oper key: 9
port priority: 255
port number: 1
port state: 63
cumulus@wceg01:mgmt-vrf:~$

==sw2==
cumulus@wceg02:mgmt-vrf:~$ net sh int
State Name Spd MTU Mode LLDP Summary
----- ------------- --- ----- ------------- --------------------------- ------------------------
UP lo N/A 65536 Loopback IP: 127.0.0.1/8
lo IP: 10.3.199.2/32
lo IP: ::1/128
UP eth0 1G 1500 Mgmt cali12c (00:50:56:98:f7:12) Master: mgmt(UP)
eth0 IP: 10.25.75.93/24(DHCP)
UP swp1 1G 1500 NotConfigured wcss01 (swp6)
UP swp2 1G 1500 NotConfigured wcss02 (swp6)
UP swp3 1G 1500 NotConfigured wcss07 (swp6)
UP swp4 1G 1500 NotConfigured wcss08 (swp6)
UP swp5 1G 1500 NotConfigured client01 (swp2)
UP swp6 1G 1500 NotConfigured internet01 (swp2)
UP swp7 1G 1500 NotConfigured repnet01 (swp2)
UP swp8 1G 1500 NotConfigured enterprise01 (swp2)
UP swp9 1G 1500 BondMember wceg01 (swp9) Master: peerlink(UP)
UP CLIENT N/A 65536 NotConfigured
UP ENTERPRISE N/A 65536 NotConfigured
UP INTERNET N/A 65536 NotConfigured
DN L3CLIENT N/A 1500 Access/L2 Master: bridge(UP)
DN L3ENTERPRISE N/A 1500 Access/L2 Master: bridge(UP)
DN L3INTERNET N/A 1500 Access/L2 Master: bridge(UP)
DN L3REPNET N/A 1500 Access/L2 Master: bridge(UP)
DN L3WCDC N/A 1500 Access/L2 Master: bridge(UP)
UP REPNET N/A 65536 NotConfigured
UP WCDC N/A 65536 NotConfigured
UP bridge N/A 1500 Bridge/L2
UP mgmt N/A 65536 Interface/L3 IP: 127.0.0.1/8
UP peerlink 1G 1500 802.3ad Master: bridge(UP)
peerlink Bond Members: swp9(UP)
UP peerlink.4094 1G 1500 SubInt/L3 IP: 169.254.10.2/30
UP vlan4001 N/A 1500 Interface/L3 Master: WCDC(UP)
vlan4001 IP: 10.40.1.1/24
UP vlan4002 N/A 1500 Interface/L3 Master: CLIENT(UP)
vlan4002 IP: 10.40.2.1/24
UP vlan4003 N/A 1500 Interface/L3 Master: INTERNET(UP)
vlan4003 IP: 10.40.3.1/24
UP vlan4004 N/A 1500 Interface/L3 Master: REPNET(UP)
vlan4004 IP: 10.40.4.1/24
UP vlan4005 N/A 1500 Interface/L3 Master: ENTERPRISE(UP)
vlan4005 IP: 10.40.5.1/24

cumulus@wceg02:mgmt-vrf:~$ sudo cat /proc/net/bonding/peerlink
sudo: unable to resolve host wceg02
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 1
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 00:50:56:98:72:02
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 1
Actor Key: 9
Partner Key: 9
Partner Mac Address: 00:50:56:98:c3:ce

Slave Interface: swp9
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 00:50:56:98:72:02
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 00:50:56:98:72:02
port key: 9
port priority: 255
port number: 1
port state: 63
details partner lacp pdu:
system priority: 65535
system mac address: 00:50:56:98:c3:ce
oper key: 9
port priority: 255
port number: 1
port state: 63
cumulus@wceg02:mgmt-vrf:~$
Userlevel 5
So I can see now that the peerlink bond looks to be up and healthy. At this point the next thing to look at is CLAG it self.

What does the output of 'net show clag verbose' look like on each side?
Here is the output. Still - the peers show "not alive"

==sw1==
cumulus@wceg01:mgmt-vrf:~$ net show clag verbose
The peer is not alive
Our Priority, ID, and Role: 1000 00:50:56:98:c3:ce primary
Peer Interface and IP: peerlink.4094 169.254.10.2
VxLAN Anycast IP: 10.3.199.10
Backup IP: 10.3.199.2 (active)
System MAC: 44:38:39:ff:00:99

CLAG Interfaces
Our Interface Peer Interface CLAG Id Conflicts Proto-Down Reason
---------------- ---------------- ------- -------------------- -----------------
L3REPNET - - - -
L3CLIENT - - - -
L3WCDC - - - -
L3ENTERPRISE - - - -
L3INTERNET - - - -

Our LACP Information
Our Interface Partner MAC CIST PortId CLAG Id Oper St Flags
---------------- ----------------- ----------- ------- ------- -----

Peer LACP Information
Peer Interface Partner MAC CIST PortId CLAG Id Oper St Flags
---------------- ----------------- ----------- ------- ------- -----

Backup info:
IP: 10.3.199.2; State: active; Role: primary
Peer priority and id: 2000 00:50:56:98:72:02; Peer role: secondary

Our Interface Dynamic MAC VLAN Id
---------------- ----------------- -------

Peer Interface Dynamic MAC VLAN Id
---------------- ----------------- -------

Interface MAC Address VLAN Id State
---------------- ----------------- ------- ---------
vlan4005 72:a6:04:32:b3:c8 4005 permanent
vlan4003 72:a6:04:32:b3:c8 4003 permanent
vlan4004 72:a6:04:32:b3:c8 4004 permanent
vlan4001 72:a6:04:32:b3:c8 4001 permanent
vlan4002 72:a6:04:32:b3:c8 4002 permanent

IP Address Our Interface Dynamic MAC VLAN Id Owner
------------------------- ---------------- ----------------- ------- ----------
fe80::70a6:4ff:fe32:b3c8 vlan4001 72:a6:04:32:b3:c8 4001 local
fe80::70a6:4ff:fe32:b3c8 vlan4002 72:a6:04:32:b3:c8 4002 local
fe80::70a6:4ff:fe32:b3c8 vlan4003 72:a6:04:32:b3:c8 4003 local
fe80::70a6:4ff:fe32:b3c8 vlan4004 72:a6:04:32:b3:c8 4004 local
fe80::70a6:4ff:fe32:b3c8 vlan4005 72:a6:04:32:b3:c8 4005 local
10.40.3.1 vlan4003 72:a6:04:32:b3:c8 4003 local
10.40.4.1 vlan4004 72:a6:04:32:b3:c8 4004 local
10.40.1.1 vlan4001 72:a6:04:32:b3:c8 4001 local
10.40.2.1 vlan4002 72:a6:04:32:b3:c8 4002 local
10.40.5.1 vlan4005 72:a6:04:32:b3:c8 4005 local

Our Multicast Group Port VLAN Id Device Age
---------------------- ---------------- ------- ---------------- ---

Peer Multicast Group Port VLAN Id Device Age
---------------------- ---------------- ------- ---------------- ---

Destination Our Interface Dynamic MAC VLAN Id Ownership
------------------------- ---------------- ----------------- ------- ----------

Our Router Port Device Age
---------------- ---------------- ---

Peer Router Port Device Age
---------------- ---------------- ---

Socket State
---------------- ----------------
socketToPeer Not Connected
socketFromPeer Not Connected
serverSocket Listening

Database md5 hash
---------------- --------------------------------
neighDB d41d8cd98f00b204e9800998ecf8427e

Timers Time remaining
---------------- --------------
startup-delay 00:00:00

Our VLAN Information
Our Interface VLAN Id
---------------- -------

Peer VLAN Information
Peer Interface VLAN Id
---------------- -------
cumulus@wceg01:mgmt-vrf:~$


==sw2==
cumulus@wceg02:mgmt-vrf:~$ net show clag verbose
The peer is not alive
Our Priority, ID, and Role: 2000 00:50:56:98:72:02 secondary
Peer Interface and IP: peerlink.4094 169.254.10.1
VxLAN Anycast IP: 10.3.199.10
Backup IP: 10.3.199.1 (active)
System MAC: 44:38:39:ff:00:99

CLAG Interfaces
Our Interface Peer Interface CLAG Id Conflicts Proto-Down Reason
---------------- ---------------- ------- -------------------- -----------------
L3REPNET - - - isl-down,vxlan-single
L3CLIENT - - - isl-down,vxlan-single
L3WCDC - - - isl-down,vxlan-single
L3ENTERPRISE - - - isl-down,vxlan-single
L3INTERNET - - - isl-down,vxlan-single

Our LACP Information
Our Interface Partner MAC CIST PortId CLAG Id Oper St Flags
---------------- ----------------- ----------- ------- ------- -----

Peer LACP Information
Peer Interface Partner MAC CIST PortId CLAG Id Oper St Flags
---------------- ----------------- ----------- ------- ------- -----

Backup info:
IP: 10.3.199.1; State: active; Role: secondary
Peer priority and id: 1000 00:50:56:98:c3:ce; Peer role: primary

Our Interface Dynamic MAC VLAN Id
---------------- ----------------- -------

Peer Interface Dynamic MAC VLAN Id
---------------- ----------------- -------

Interface MAC Address VLAN Id State
---------------- ----------------- ------- ---------
vlan4003 12:c6:47:15:2d:48 4003 permanent
vlan4004 12:c6:47:15:2d:48 4004 permanent
vlan4001 12:c6:47:15:2d:48 4001 permanent
vlan4002 12:c6:47:15:2d:48 4002 permanent
vlan4005 12:c6:47:15:2d:48 4005 permanent

IP Address Our Interface Dynamic MAC VLAN Id Owner
------------------------- ---------------- ----------------- ------- ----------
fe80::10c6:47ff:fe15:2d48 vlan4001 12:c6:47:15:2d:48 4001 local
fe80::10c6:47ff:fe15:2d48 vlan4005 12:c6:47:15:2d:48 4005 local
fe80::10c6:47ff:fe15:2d48 vlan4003 12:c6:47:15:2d:48 4003 local
10.40.3.1 vlan4003 12:c6:47:15:2d:48 4003 local
fe80::10c6:47ff:fe15:2d48 vlan4004 12:c6:47:15:2d:48 4004 local
10.40.4.1 vlan4004 12:c6:47:15:2d:48 4004 local
10.40.1.1 vlan4001 12:c6:47:15:2d:48 4001 local
fe80::10c6:47ff:fe15:2d48 vlan4002 12:c6:47:15:2d:48 4002 local
10.40.2.1 vlan4002 12:c6:47:15:2d:48 4002 local
10.40.5.1 vlan4005 12:c6:47:15:2d:48 4005 local

Our Multicast Group Port VLAN Id Device Age
---------------------- ---------------- ------- ---------------- ---

Peer Multicast Group Port VLAN Id Device Age
---------------------- ---------------- ------- ---------------- ---

Destination Our Interface Dynamic MAC VLAN Id Ownership
------------------------- ---------------- ----------------- ------- ----------

Our Router Port Device Age
---------------- ---------------- ---

Peer Router Port Device Age
---------------- ---------------- ---

Socket State
---------------- ----------------
socketToPeer Not Connected
socketFromPeer Not Connected
serverSocket Listening

Database md5 hash
---------------- --------------------------------
neighDB d41d8cd98f00b204e9800998ecf8427e

Timers Time remaining
---------------- --------------
startup-delay 00:00:00

Our VLAN Information
Our Interface VLAN Id
---------------- -------

Peer VLAN Information
Peer Interface VLAN Id
---------------- -------
cumulus@wceg02:mgmt-vrf:~$ ^C
I have the same issue on vsphere with Cumulus VX and MLAG config. Is there any resolution to this? or an how to guide how to set up the vsphere environment correctyl for Cumulus VX Mlag config to work properly?
Userlevel 5
Unfortunately as a Linux company we tend to be pretty focused on KVM for our simulations. I don't have any ESX nodes for which I could write such a guide.
In the above example the issue seemed to center around the peerlink configuration. If you're able to use NCLU to setup the CLAG relationship you should have the issues above corrected.

Reply