How is the Management VRF special ?


Userlevel 1
Hi,

I'm wondering what being a mgmt VRF implied exactly.

Mostly what's different between :
* a mgmt VRF + using the default one for "public internet" routing.
* Using default one for mgmt + using a data plane VRF for "public internet".

Currently I was more leaning toward the second solution since it seems to have less limitation (i.e. not limited to eth0 for instance), also if I ever want OSPF for my internal private network, that part has to be in the default one (not sure if VRF OSPF is planned for the future ?).

Here's a bit of explanation about my setup if needed for context.

- 2 * 10G switches running cumulus with mlags to the hosts.

- The switches 'eth0' are connected to an 'emergency' gateway that will only be used as a last resort. Most likely we won't administer them through there using something goes very wrong.

- What we actually call "management network" that has the machine BMCs and such is just a separate VLAN but connected to the same switches (there is way too many ports so we can afford to 'waste' a few 10G port with 1G copper for that). The switches would have a L3 interface on that VLAN and that's where they would be managed from most of the time and also get their 'management' internet access (fetch packages ...)

- Most machines (physical & virtual) only have private RFC1918 IPs (in the management VLANs and most other VLANs for inter VM communication). And I'm trying to keep that as separate of the "public internet" side of things as possible. Machines on that private network get their internet access from routers "on a stick" on those switches (doing the NAT, firewall, ...)

- Routes in those private IP vlans are redistributed using OSPF (to other sites) and there are dedicated routers that handle that. So technically the switch don't have to participate to OSPF, but maybe in the future ...

- Those switches act as our border routers to the public internet (but they only receive default routes + a few more specific ones). I would put everything BGP and public IP stuff in a separate VRF.

Maybe actually doing both options at once is worth considering:
- Use the mgmt VRF for the 'emergency access' so it has its own routes.
- Use the default VRF for our RFC1918 internal network
- Use a data plane VRF for all the BGP public internet stuff.

Cheers,

Sylvain

10 replies

Userlevel 4
The mgmt vrf is a special vrf for eth0 (or eth1 on some switch platforms). The eth ports are software only ports that are not hardware accelerated.

- The switches 'eth0' are connected to an 'emergency' gateway that will only be used as a last resort. Most likely we won't administer them through there using something goes very wrong.

You only want OOB (out of band) traffic to use your eth0, whether or not you have a VRF or not. This is b/c this port is SW switched, not HW switched.

Where a VRF helps is when you have multiple matching routes. For example you could have a default route (0.0.0.0/0) on eth0 you got from your DHCP server, and a default from your internet service provider for your data plane traffic. You now have two routes
  • 0.0.0.0/0 via DHCP (Kernel Route) Admin Distance 0
  • 0.0.0.0/0 via OSPF or BGP, Admin Distance > 1
What happens is your Dynamic routing protocol never gets installed.... VRF gives you two route tables so you can have overlapping routes.

The rest of your questions don't seem to go along with VRF. What is the exactly problem? Did my answer help here? Maybe we can dive down a deeper level now.
Userlevel 3
Sean Cavanaugh wrote:

The mgmt vrf is a special vrf for eth0 (or eth1 on some switch platforms). The eth ports are sof...

I like this explanation a lot, Sean. Mind if I repurpose it for the docs?
Userlevel 4
Sean Cavanaugh wrote:

The mgmt vrf is a special vrf for eth0 (or eth1 on some switch platforms). The eth ports are sof...

lol sure
Userlevel 1
Yeah, I'm aware the eth0 port should only be used for OOB administration.

But one of the thing I was wondering was if I could use a front-panel for instead for day-to-day administration. But turns out it seems like a bad idea because (1) a bunch of rate limit rules are applied to traffic from front panel ports to the linux host, so this needs quite a bit of default config override (2) Traffic using those ports goes through switchd which tends to use CPU time.

So if I want the switch to be administrable using two different path (eth0 itself is not redudant, I want to have two path in case one goes down ...), I'm better off using eth0 for "day-to-day" administration and access and then use a front-panel-port (SVI) for "emergency / recovery" in case my eth0 link is screwed up.

The other think I was wondering is how / where the special name 'mgmt' is matched and what kind of different behavior does it trigger vs naming it 'admin' (just an examples). That name is apparently special and triggers some different part of the code, I'm wondering which ones.

As for my reasons for using VRFs, it is mostly security / isolation. To make sure some misconfig couldn't lead to packets from the internet leaking into my internal network (since the switch is directly connected to upstreams providers and I can't trust anything coming from there). It also allows me to make sure all the "apps" running on the switch ( ntp / smtp / ... ) would go through our internal router/firewall rather than directly to the upstream providers.
Userlevel 4
Sylvain Munaut wrote:

Yeah, I'm aware the eth0 port should only be used for OOB administration.

But one of the thing I...

The mgmt name triggers special handling of DNS traffic specifically if you're using the MGMT vrf and run the "ip rule ls" command, you'll see that when applied the mgmt vrf builds a special rule to send traffic from your currently configured DNS server out the eth0 port (instead of out your front-panel ports). From talking to David Ahern, I understand that will be a configurable setting soon but that is the only piece of specialness I'm aware of. I'll send this thread over to David Ahern in case I've missed something. (I'm sure I have).
Userlevel 1
Sylvain Munaut wrote:

Yeah, I'm aware the eth0 port should only be used for OOB administration.

But one of the thing I...

Ok, thanks really good to know.

Is that rule inserted in the HW ? (i.e. if I have packets being routed from one front-panel port to another toward the DNS server IP, will they be affected)

Sylvain Munaut wrote:

Yeah, I'm aware the eth0 port should only be used for OOB administration.

But one of the thing I...

The "mgmt" name is special cased to identify the Management VRF from a data plane VRF. As Eric mentioned, FIB rules are installed for DNS servers since that is the usual deployment case. In addition, the user shell is set to the Management VRF context at login. This allows admin tools like ansible, chef, apt-get to Just Work over the management plane with no change in how the command is run. It really comes down to making Management VRF transparent and easy, especially for new users doing a typical deployment.
Userlevel 3
On May 23, 2018, a user named Richard Pilsbury posted the following comment to our old community while we were transitioning to this new platform:

"Hi David,

Can you explain how the user shell is set to the management VRF? i.e. what config changes are made? I am trying to use a management VRF on a non-cumulus (still Debian-based) platform. I fixed the DNS issue with an ip rule, but am now facing the problem that apt etc. still tries to use the default table."
Userlevel 3
On May 23, 2018, @David Ahern replied to Richard on our old community while we were transitioning to this new platform:

Hi Richard:

libpam-script is used to check if mgmt VRF is enabled. If so, it sets login shells to the mgmt VRF context.

For non-CL platforms I suggest taking a look at my OSS slides:
http://schd.ws/hosted_files/ossna2017/fe/vrf-tutorial-oss.pdf

And in particular the vrf + mgmt-vrf packages in
https://github.com/CumulusNetworks/vrf

(building a deb or rpm and installing that way is best - it handles the libpam-script dependencies).
Userlevel 3
Finally, on May 24, 2018, Richard Pilsbury posted this followup comment to our old community while we were transitioning to this new platform:

Fantastic - that's sorted it. Thanks for your help (and the really quick reply on a year old thread!).

Reply