autoneg keeps reverting back to off on 40G Base-CR4


We have the following transceiver

root@s4048-r2r6-u26-leaf08:mgmt-vrf:/var/log# ethtool -m swp49
Identifier : 0x0d (QSFP+)
Extended identifier : 0x00
Extended identifier description : 1.5W max. Power consumption
Extended identifier description : No CDR in TX, No CDR in RX
Extended identifier description : High Power Class (> 3.5 W) not enabled
Connector : 0x23 (No separable connector)
Transceiver codes : 0x08 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Transceiver type : 40G Ethernet: 40G Base-CR4
Encoding : 0x00 (unspecified)
BR, Nominal : 10300Mbps
Rate identifier : 0x00
Length (SMF,km) : 0km
Length (OM3 50um) : 0m
Length (OM2 50um) : 0m
Length (OM1 62.5um) : 0m
Length (Copper or Active cable) : 1m
Transmitter technology : 0xa0 (Copper cable unequalized)
Attenuation at 2.5GHz : 3db
Attenuation at 5.0GHz : 5db
Attenuation at 7.0GHz : 0db
Attenuation at 12.9GHz : 0db
Vendor name : Amphenol
Vendor OUI : 41:50:48
Vendor PN : 599690001
Vendor rev : E
Vendor SN : APF12210014RWJ
Revision Compliance : Revision not specified
Module temperature : 0.00 degrees C / 32.00 degrees F
Module voltage : 0.0000 V

We set auto-neg to on using 'ethtool -s swp49 speed 40000 duplex full autoneg on' - it does take affect but after a while it turns back to off by itself causing the link to go down.

We see following in logs - any clues what is wrong here?

switchd.log.1:2017-12-14T19:59:34.864902+00:00 s4048-r2r6-u26-leaf08 switchd[19677]: sync_port.c:333 sync_port_settings: pushing settings to hal (swp49: speed=40000, duplex=1, autoneg=1, port=3, mdix=1, nwords=0 sup=40G, adv=40G, lp_adv=40G)
switchd.log.1:2017-12-14T19:59:35.239990+00:00 s4048-r2r6-u26-leaf08 switchd[19677]: sync_port.c:383 sync_port_settings: pushing settings to kernel (swp49: speed=0, duplex=ff, autoneg=1, port=3, mdix=1, nwords=0 sup=40G, adv=40G, lp_adv=)
switchd.log.1:2017-12-14T19:59:35.353291+00:00 s4048-r2r6-u26-leaf08 switchd[19677]: sync_port.c:333 sync_port_settings: pushing settings to hal (swp51: speed=40000, duplex=1, autoneg=1, port=3, mdix=1, nwords=0 sup=40G, adv=40G, lp_adv=40G)
switchd.log.1:2017-12-14T19:59:35.741675+00:00 s4048-r2r6-u26-leaf08 switchd[19677]: sync_port.c:383 sync_port_settings: pushing settings to kernel (swp51: speed=0, duplex=ff, autoneg=1, port=3, mdix=1, nwords=0 sup=40G, adv=40G, lp_adv=)
switchd.log.1:2017-12-14T19:59:36.453327+00:00 s4048-r2r6-u26-leaf08 switchd[19677]: sync_port.c:383 sync_port_settings: pushing settings to kernel (swp49: speed=40000, duplex=1, autoneg=1, port=3, mdix=1, nwords=0 sup=40G, adv=40G, lp_adv=40G)
switchd.log.1:2017-12-14T19:59:36.644116+00:00 s4048-r2r6-u26-leaf08 switchd[19677]: sync_port.c:383 sync_port_settings: pushing settings to kernel (swp51: speed=40000, duplex=1, autoneg=1, port=3, mdix=1, nwords=0 sup=40G, adv=40G, lp_adv=40G)
switchd.log.1:2017-12-14T19:59:40.103683+00:00 s4048-r2r6-u26-leaf08 switchd[19677]: ethtool_swp.c:134 do_settings: pushed settings to hal (swp51: speed=40000, duplex=1, autoneg=0, port=3, mdix=1, nwords=2 sup=40G, adv=40G, lp_adv=40G)
switchd.log.1:2017-12-14T19:59:40.295036+00:00 s4048-r2r6-u26-leaf08 switchd[19677]: sync_port.c:383 sync_port_settings: pushing settings to kernel (swp51: speed=40000, duplex=1, autoneg=0, port=3, mdix=1, nwords=0 sup=40G, adv=, lp_adv=)
switchd.log.1:2017-12-14T19:59:41.116164+00:00 s4048-r2r6-u26-leaf08 switchd[19677]: ethtool_swp.c:134 do_settings: pushed settings to hal (swp49: speed=40000, duplex=1, autoneg=0, port=3, mdix=1, nwords=2 sup=40G, adv=40G, lp_adv=40G)
switchd.log.1:2017-12-14T19:59:41.486303+00:00 s4048-r2r6-u26-leaf08 switchd[19677]: sync_port.c:383 sync_port_settings: pushing settings to kernel (swp49: speed=40000, duplex=1, autoneg=0, port=3, mdix=1, nwords=0 sup=40G, adv=, lp_adv=)

We are running Cumulus 3.3.2
# cat /etc/lsb-release
DISTRIB_ID="Cumulus Linux"
DISTRIB_RELEASE=3.3.2
DISTRIB_DESCRIPTION="Cumulus Linux 3.3.2"

2 replies

Userlevel 3
Hi Raghavendra, thanks for reporting this. I asked one of our engineers who works on transceivers, and he hasn't heard of this issue before (we've got some fixes for auto-neg issues in the next release of Cumulus Linux, but this isn't one of them). It's best if you contact our support team and share this information with them (along with the cl-support output).

Submit a support request
Userlevel 3
Hi Raghavendra,
I am not aware of this specific issue either, and these are the exact same cables we are using in the lab. What is connected on the other end of this link? It looks like it autoneg's fine, then 6 seconds later, the link is renegotiated. Seems like it is not a bug, but something is intentionally changing the setting. AN is mandatory for 40G, so it would be great to get the information of the link-partner.
Also, there have been a lot of layer1 fixes since 3.3.2, and I would recommend upgrading to the latest and see if it works.

Reply