Detecting service quality issues
While networks handles the increased load from Corona with ease, not all services can say the same. Detecting quality issues in service delivery is much more difficult than checking if the network itself can handle the load. There is probably a million reasons why service quality can degrade even if the network seems to be working fine. The trick to detecting and troubleshooting these situations is use of telemetry and service assurance.
I was interviewd by Swedish tech site TelekomIdag in relation to the Corona impact on networks and this specific topic.
The article, in Swedish is here; https://telekomidag.se/experten-sa-undviker-operatorer-overbelastning-i-naten/
As I posted in my last blog post the data from Sweden shows that the networks are coping well with the increased load. Overall things are good but yet users experience quality issues with some of their services. One user I talked to was unable to watch the prime minister speech to the nation last Sunday - the all too famous buffering buffering buffering of video prevented her from receiving the live broadcast from Swedish Television over the network.
One might think that the main sources for information in a crisis situation such as state television should always work. People will rely on that. So if vital services break down and it affects a big portion of the user base, operators need to become aware really fast. In this case, the problem certainly was not due to overloaded networks as the graphs show. So what was it?
It might have been unreliable WiFi in the user's home - lots of WiFi networks active at the same time. It could also have been a local problem in the neighborhood of this user. Some local link that is congested. It could also have been a capacity problem with the content delivery network the TV station uses, outside of the control of the network operator.
It could have been any of a zillion possible reasons. Either way, we will never now, because not enough information to review what was happening, or enough information to detect the problem by the operator and react to it was collected. Even if this single event was the first signal of a bigger issue, for most operators it would need to get a lot worse before they will react with the tools they tend to use today.
The point I am making in the interview is that there are solutions available that can help shed more light on these situations.
Waystream has introduced telemetry in all of our products, a feature all our customers now can use on both old and new platforms. Together with service assurance where the network equipment performs passive monitoring or active testing (sending real traffic the same way as the end-users do) it is possible to get a view of the service quality in all corners of the network.
Telemetry is an efficient method in the equipment to upload operational status information at high frequency. Thousands of parameters that are constantly measured and updated gets uploaded every minute - or even more often if needed. That creates a deeper insight into the network performance. Using tools for anomaly detection operators can process huge amounts of data and get early warning when things start to deteriorate.
If service assurance such as quality monitoring of TV is part of the dataset, operators can learn instantly where problems occur, which users/areas that are affected and what type of problem that is experienced. This means faster resolution, perhaps even before users start calling in to report the issue and ability to make the right priority - which problems to address first.
Telemetry and service assurance are really cool features and they make perfect sense to have in the key infrastructure of today - the fiber networks.
Get in touch with us if you want to know more about how telemetry and service assurance works. Its actually really simple!