How to verify that ECN is working


Userlevel 1
I'm currently trying to find a way to prove, that ECN (https://docs.cumulusnetworks.com/display/DOCS/Buffer+and+Queue+Management#BufferandQueueManagement-e...) mechanism is actually working.

I have 4 servers, all connected to one switch with 10GE ports. I have enabled ECN on all servers and on switch itself for relevant ports. Now I want to make sure, that ECN is actually doing something. So, I emulated interface overload - from one server I started 3 iperf TCP sessions to 3 another servers (so, 3 servers are sending towards one server).

I expect to see, that eventually the receiving server's interface on the switch will be overloaded and it will start marking IP packets with '11 - Congestion Encountered' bits. But, I don't see any on the server (running tcpdump like this: tcpdump -i ens4f0 '(ip[1] & 3 == 3)' ).

I even reduced ECN threshold to 1000 bytes on switch - ecn.ecn_port_group.max_threshold_bytes = 1000 with no visible result.

Maybe, anybody could advise, how to actually emulate and see the result of working ECN?

Thank you in advance

Sergei.

4 replies

Sergei,

since you are already aware of the config guide, i will assume the config is correct here. next, i would verify switchd was restarted and check your platform supports ECN with Cumulus Linux (ECN is supported on Broadcom Tomahawk, Trident II+ and Trident II, and Mellanox Spectrum switches only).

your tcpdump filter looks ok to me, but could you also check if it is matching on any values 0-3? we know you checked 3 but let's see if it's matching a different value that would further confirm ECN is not being set by the switch

if all of this has already been verified, i'd ask you open a Support case with us, please provide a cl-support file while the congestion is present.
Userlevel 1
Mark, thank you for comments.

The config looks to be aligned with the guide:
ecn.port_group_list = [ecn_port_group]
ecn.ecn_port_group.cos_list = [3]
ecn.ecn_port_group.port_set = swp35-swp45
ecn.ecn_port_group.min_threshold_bytes = 1000
ecn.ecn_port_group.max_threshold_bytes = 1000
ecn.ecn_port_group.probability = 100

We use T3048-LY8, which, I guess, uses Trident II - so, should be ok here as well.

With tcpdump I have tried matching on '10' - tcpdump -i ens4f0 '(ip[1] & 3 == 2)' ), and I see many packets coming (which is just a notification from servers, that they do support ECN).

Please, comment on my assumptions here - if they are correct, they I will go ahead and open the case.



the last thing to check is if switchd restart was performed, the command is
systemctl restart switchd.service
and you may check the log file
grep -R ECN /var/log/switchd.log
if it's been enabled in hardware, then i'd recommend you open a case and ask it to be assigned to me since i am familiar with your troubleshooting so far
Userlevel 4
If I'm not mistaken, there are also interface counters available via ethtool for ECN.

Reply