VXLAN MTU > 1500 bytes


I have created a L3 leaf spine topology consisting of 4 nodes (2x leaf & 2x spine) with VX and I'm having problems with the MTU on the VXLAN interfaces.

The interfaces between the leafs and spines are configured with a 2000 byte MTU (and I can send ping traffic with DF set up to 2000 bytes on these) but the VXLAN interface appears to only support up to 1500 byte MTU. If you try to increase this (ip link set mtu 1600 vni-X) then it just errors. Setting to 1500 (or lower) works.

You can send 1500 byte packets into the VXLAN (so with encapsulation it'll be 1550 bytes) but you can't send anything larger through.

I've tried with both bridging (when you put the VXLAN interface into a bridge, the bridge inherits the lowest MTU of all slaves ports) and just setting an IP address on the VXLAN interface. Neither work for larger than 1500 byte packets.

This means that you can't transport jumbo frames across VXLAN so can't be used as a drop in replacement for networks where this is important.

Have I missed some way of making this work?

3 replies

Userlevel 4
Hey Matt,

Did you actually try a ping? Use the ping from -I with an address to force it to source. There was a bug with VXLAN with the MTU for older releases wouldn't work, and it would work on Cumulus Linux but the MTU would not be reflected correctly.

Can you also provide the kernel you are running?
cumulus@leaf1$ uname -r3.2.68-6+cl25u2

Also do a
cat /etc/lsb-release
Sean,

Yep, I tried a ping. It works as long as the packet size doesn't exceed 1500 bytes on ingress to the VXLAN interface.

From /etc/network/interfaces:

iface vni-101001
address 10.0.1.1/30
vxlan-id 101001
vxlan-local-tunnelip 10.65.0.3

The other end of the tunnel is 10.65.0.4:

root@CumulusVX-L1:~# vxrdctl vxlans
VNI Local Addr Svc Node
=== ========== ========
101001 10.65.0.3 10.65.0.254

root@CumulusVX-L1:~# vxrdctl peers
VNI Peer Addrs
=== ==========
101001 10.65.0.3, 10.65.0.4

Pinging the 10.64.0.4 address with a 2000 byte packet works:

root@CumulusVX-L1:~# ping -M do -s 1972 -c 3 10.65.0.4
PING 10.65.0.4 (10.65.0.4) 1972(2000) bytes of data.
1980 bytes from 10.65.0.4: icmp_req=1 ttl=63 time=0.761 ms
1980 bytes from 10.65.0.4: icmp_req=2 ttl=63 time=0.509 ms
1980 bytes from 10.65.0.4: icmp_req=3 ttl=63 time=0.558 ms

--- 10.65.0.4 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 0.509/0.609/0.761/0.110 ms

Unsurprisingly, a 2001 byte packets doesn't (because my interface MTUs are 2000 bytes):

root@CumulusVX-L1:~# ping -M do -s 1973 -c 3 10.65.0.4
PING 10.65.0.4 (10.65.0.4) 1973(2001) bytes of data.
From 10.129.0.2 icmp_seq=1 Frag needed and DF set (mtu = 2000)
From 10.129.0.2 icmp_seq=1 Frag needed and DF set (mtu = 2000)
From 10.129.0.2 icmp_seq=1 Frag needed and DF set (mtu = 2000)

--- 10.65.0.4 ping statistics ---
0 packets transmitted, 0 received, +3 errors

Ping from the VXLAN interface with a 1500 byte packet works:

root@CumulusVX-L1:~# ping -I vni-101001 -c 3 10.0.1.2 -M do -s 1472
PING 10.0.1.2 (10.0.1.2) from 10.0.1.1 vni-101001: 1472(1500) bytes of data.
1480 bytes from 10.0.1.2: icmp_req=1 ttl=64 time=12.4 ms
1480 bytes from 10.0.1.2: icmp_req=2 ttl=64 time=1.01 ms
1480 bytes from 10.0.1.2: icmp_req=3 ttl=64 time=0.959 ms

--- 10.0.1.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 0.959/4.794/12.409/5.384 ms

Ping with a 1501 bytes packet doesn't:

root@CumulusVX-L1:~# ping -I vni-101001 -c 3 10.0.1.2 -M do -s 1473
PING 10.0.1.2 (10.0.1.2) from 10.0.1.1 vni-101001: 1473(1501) bytes of data.
From 10.0.1.1 icmp_seq=1 Frag needed and DF set (mtu = 1500)
From 10.0.1.1 icmp_seq=1 Frag needed and DF set (mtu = 1500)
From 10.0.1.1 icmp_seq=1 Frag needed and DF set (mtu = 1500)

This is the MTU as set by default:

cumulus@CumulusVX-L1$ ip link show vni-101001
15: vni-101001: mtu 1500 qdisc noqueue master br-10 state UNKNOWN mode DEFAULT
link/ether 56:e3:f4:b0:fe:20 brd ff:ff:ff:ff:ff:ff

If I try to increase the MTU on the VXLAN interface

cumulus@CumulusVX-L1$ sudo ip link set mtu 1600 dev vni-101001
RTNETLINK answers: Invalid argument

cumulus@CumulusVX-L1$ sudo ip link set mtu 1400 dev vni-101001
cumulus@CumulusVX-L1$

cumulus@CumulusVX-L1$ sudo ip link set mtu 1500 dev vni-101001
cumulus@CumulusVX-L1$

cumulus@CumulusVX-L1$ sudo ip link set mtu 1501 dev vni-101001
RTNETLINK answers: Invalid argument

In terms of the output you requested:

cumulus@CumulusVX-L1$ uname -a
Linux CumulusVX-L1 3.2.65-1+deb7u2+cl2.5+2 #3.2.65-1+deb7u2+cl2.5+2 SMP Wed Jul 29 14:21:03 PDT 2015 x86_64 GNU/Linux

cumulus@CumulusVX-L1$ lsb_release -a
No LSB modules are available.
Distributor ID: Cumulus Linux
Description: 2.5.6-f23440b-201601151545-build
Release: 2.5.6
Codename: wheezy

To close this off, it now works. The problem seemed to be my frankenstein OS version. It started with 2.5.3 and had some upgrades applied (to 2.5.6) before I realised that this wasn't supported.

I've reinstalled with 2.5.5 (made much easier by having 95% of my config applied using Ansible) and I can now increase the MTU on the VXLAN (vtep) interface. This is also reflected in the bridge MTU.