Telemetry Data on Cumulus


A new feature in some network products is to stream data from a switching device at a much faster interval than SNMP polling -- as an example every 10-20 seconds, stream all interface statistics as JSON data to influxDB (or other) server, thus allowing for more granular interface bandwidth statistics, and other benefits such as seeing that an interface starting taking errors in a 10-20 second time frame rather than a 5-minute SNMP polling interval.

Juniper is doing this, as is Cisco, and I think some others -- Juniper created an Open Telemetry thing here:

https://forums.juniper.net/t5/Analytics/Open-Source-Universal-Telemetry-Collector-for-Junos/ba-p/288...

I am curious if Cumulus is doing anything in this space, or anyone doing anything with something like Intel Snap ( https://github.com/intelsdi-x/snap ), which does have an ethtool plugin, which for interface stats, I think would get this working -- I have not had a chance to try it out yet to see if Intel Snap will install on Cumulus or not.

I think this space is interesting because its not limited to interface stats, but I think could be anything -- cpu, memory, process state, etc, and it seems to be less CPU intensive than something like super aggressive SNMP polling intervals (especially on switches with 10 cent CPUs in them...)

Really just curious to see what others think, or if anyone is already doing something around this with Cumulus.

Thanks,

Will

4 replies

was able to do some initial testing today with this idea.

good news: snapd seems to run fine (or at least, the daemon will run...)

bad news: snapd doesn't currently have a mode where it runs as an actual daemon -- you have to run it and then leverage tmux for multiple sessions

more bad news: the ethtool plugin fails to load with an EOF error. This may be because of the (seemingly) rather old version of Ethtool that Cumulus ships with. my 2.5 VX instance as well as physical switch have ethtool 3.4.2, whereas an ubuntu VM loads the plugin just fine with ethool version 3.13

i've filed an issue on github with the plugin developer, we will see where that takes us.

In the meantime, until Intel Snap can be run as a daemon properly, its use to me is pretty limited.

Cheers,

Will
another update. There is an open source tool from InfluxData called Telegraf which I think will fit this use case perfectly. Their 'net' plugin polls data from all interfaces (by default) every 10 seconds (configurable) -- unclear yet as to what kind of CPU impact this may have on a Cumulus switch, if any noticeable.

Was able to install it without issue on a VX and its properly sending data to InfluxDB server in the lab.
Userlevel 4
Yes Will I have been looking at intel/SNAP.... I did some experimentation with this and started to write a plugin for Quagga to pull some statistics that are available via any of the built-in plugins but I haven't had time to finish out the experimentation as I have gotten busy helping deploy networks for paying customers so the R&D work goes on the backburner as a Consultant. I got as far as setting up two VX nodes and testing the collection of IF stats to a temp file and learning enough GO to collect a few test statistics from Quagga. There's still a good bit of work left to do here. We're also exploring a more custom solution internally that is very similar to SNAP... defining things to be collected... collecting them, and sending them to a data collector. Expect to hear more on that around the 3.2 release.
Userlevel 1
Have you considered enabling sFlow?
https://cumulusnetworks.com/blog/cumulus-networks-sflow-and-data-center-automation/

Architecturally, sFlow is very similar to OpenConfig Telemetry that Cisco and Juniper are implementing on their carrier routers:
http://blog.sflow.com/2016/06/streaming-telemetry.html

In addition to streaming interface counters, host stats, and packet samples, the Cumulus Linux sFlow implementation includes ASIC resource monitoring:
http://blog.sflow.com/2015/02/broadcom-asic-table-utilization-metrics.html

Additional features are being added to the open source sflow.net agent that runs on Cumulus:
You can compile from sources to get the latest changes.

Reply