Many networks rely on network monitoring using SNMP and ping to read out statistics and verify that each network element is reachable. This passive and centralized monitoring of the network will tell you that links are working and systems are reachable from the network monitoring system (NMS) and may provide a few interface traffic graphs that shows if links are getting close to overload.
But it does not give you the detailed insight into how each network element is performing and what issues that could be affecting the traffic passing through those elements.
Using Waystreams streaming telemetry, the switches have approximately 1300 datapoints that can be collected and continuously exported to a database using streaming telemetry. These datapoints give a deeper insight into the performance and operational status of each switch, including actions taken that affect user traffic.
By collecting, visualizing and analysing the measurements it is possible to learn more about the network health and to better troubleshoot issues affecting customers despite "all-green" status in the NMS.
Benefits include;
- Proactive detection of anomalies allows problems to be resolved before they affect customers.
- Reactive troubleshooting becomes faster by enabling visualization of datasets with changes over time to detect abnormal values and correlate with observed service degradation or effects
- After-the-fact analysis of problems to implement preventive measures stopping the problem from happening again
The solution works by enabling a telemetry subscription in the switch for the specific measurements and upload intervals to be included. The switch will collect the datapoints at the set intervals and upload the data to the configured destination. If the destination is unavailable, for example due to a network problem, the switch will store the collected datapoints (and subsequent collections) and upload them once the destination can be reached again. This capability allows autonomous continued collection of datapoints independent of the central system. Such datapoints be vital to determine the root cause of an outage or other network problem and also allows an ocean of data to be created where more advanced technologies such as artificial intelligence/machine learning tools can be applied.
The telemetry feature enables subscription of a large set of measurements, including;
- Interface counters for traffic, drops, errors
- Interface optical module transmit/receive levels
- Environmental data such as temperature, fan operation, power status
- Multicast TV traffic analysis of bandwidth, delay and lost/error packets
- Detailed packet drop counters and reason
- Packet flow counters and buffer allocation
- System backplane queue load and drops
- System uptime, model, software version etc.
- System memory and load including memory bus utilization and ECC error correction
- Individual process CPU consumption and memory consumption
- NPU load and packet handling per traffic class
Some datasets may be platform dependent
The telemetry upload formats supported are
- Native InfluxDB line-protocol over HTTP
- JSON objects over TCP
The InfluxDB line-protocol format allows a fast and easy integration to the popular open-source timeseries database. The network can start to provide telemetry data in seconds and any tool that can operate on data in the Influx database can be used also to process the telemetry from the access network.
The JSON format provides a flexible and well-known packaging of measurement objects that enables easy processing of datapoints. Many programming languages have support for JSON-object handling which therefore make the telemetry interoperable with a large set of tools and solutions.