Detecting service quality issues

Submitted by fredrik.nyman on Tue, 03/24/2020 - 15:31

While networks handles the increased load from Corona with ease, not all services can say the same. Detecting quality issues in service delivery is much more difficult than checking if the network itself can handle the load. There is probably a million reasons why service quality can degrade even if the network seems to be working fine. The trick to detecting and troubleshooting these situations is use of telemetry and service assurance.

I was interviewd by Swedish tech site TelekomIdag in relation to the Corona impact on networks and this specific topic.

The article, in Swedish is here; https://telekomidag.se/experten-sa-undviker-operatorer-overbelastning-i-naten/

As I posted in my last blog post the data from Sweden shows that the networks are coping well with the increased load. Overall things are good but yet users experience quality issues with some of their services. One user I talked to was unable to watch the prime minister speech to the nation last Sunday - the all too famous buffering buffering buffering of video prevented her from receiving the live broadcast from Swedish Television over the network.

Image removed.

One might think that the main sources for information in a crisis situation such as state television should always work. People will rely on that. So if vital services break down and it affects a big portion of the user base, operators need to become aware really fast. In this case, the problem certainly was not due to overloaded networks as the graphs show. So what was it?

It might have been unreliable WiFi in the user's home - lots of WiFi networks active at the same time. It could also have been a local problem in the neighborhood of this user. Some local link that is congested. It could also have been a capacity problem with the content delivery network the TV station uses, outside of the control of the network operator.

It could have been any of a zillion possible reasons. Either way, we will never now, because not enough information to review what was happening, or enough information to detect the problem by the operator and react to it was collected. Even if this single event was the first signal of a bigger issue, for most operators it would need to get a lot worse before they will react with the tools they tend to use today.

The point I am making in the interview is that there are solutions available that can help shed more light on these situations.

Image removed.

Waystream has introduced telemetry in all of our products, a feature all our customers now can use on both old and new platforms. Together with service assurance where the network equipment performs passive monitoring or active testing (sending real traffic the same way as the end-users do) it is possible to get a view of the service quality in all corners of the network.

Telemetry is an efficient method in the equipment to upload operational status information at high frequency. Thousands of parameters that are constantly measured and updated gets uploaded every minute - or even more often if needed. That creates a deeper insight into the network performance. Using tools for anomaly detection operators can process huge amounts of data and get early warning when things start to deteriorate.

If service assurance such as quality monitoring of TV is part of the dataset, operators can learn instantly where problems occur, which users/areas that are affected and what type of problem that is experienced. This means faster resolution, perhaps even before users start calling in to report the issue and ability to make the right priority - which problems to address first.

Telemetry and service assurance are really cool features and they make perfect sense to have in the key infrastructure of today - the fiber networks.

Get in touch with us if you want to know more about how telemetry and service assurance works. Its actually really simple!

Blog posts

How do you troubleshoot IoT devices?

Submitted by fredrik.nyman on Fri, 02/15/2019 - 13:00

Continuing on the subject of troubleshooting the network. Troubleshooting MPEG video has the benefit of a user that can tell you if it doesn't work and you can simply ask that user if the problem persists once you have fixed it. But what if there isn't any obvious way to determine if things are working, for example is that trashcan really signalling that its' full or does the temperature device really update the building climate control properly?

How to see what your users see

Submitted by fredrik.nyman on Mon, 02/11/2019 - 10:21

Live broadcast TV is one of the most popular services in fibre networks. You can get high quality pictures because there is enough bandwidth to send video uncompressed. But the nature of broadcast media is that it is very sensitive to packet loss or jitter. There is no retransmission of packets because it is live – you can’t hold the stream to get a lost packet back.

FTTH is not like any other network

Submitted by fredrik.nyman on Fri, 01/25/2019 - 13:34

If you are working in network engineering, hands-on with the routers and switches in the network, you probably have seen your fair share of network problems. However well you build it there is always some intermittent issue, some complaining user, some application that doesn’t get the throughput, some website that is unreachable.

It’s part of the everyday chaos of running a network to deal with big and small issues.

The Way Better Blog

Submitted by fredrik.nyman on Fri, 01/25/2019 - 10:02

In this blog I will be writing about some of the topics, big and small, facing network engineers and fibre networks and the kind of challenges I have encountered working with our customers over the past 20 years or so.