June 6, 2019

By Jonas Krogell, Technical Product Manager, Netrounds

Giving away control of business-critical infrastructures to a third party relieves you of operational responsibility - but also the insight on whether the network is available and how its performing. Take for example the recent Google Cloud Platform (GCP) outage that left dependent Google customers without services for over four hours. Four hours is several hours too many for their customers who depend on the services to work. If you are interested in reading our analysis of the data collected from the Netrounds Test Agents we have in GCP and the instant detection of a key performance indicator (KPI) degradation and service level agreement (SLA) violation in the traffic that day, read our blog post here.

Anyone involved with the shift to network function virtualization (NFV), and moving more and more to cloud following the software-defined networking in a wide area network (SD-WAN) movement will benefit from reading further in this blog. As a panel of virtualization experts recently stated, “Netrounds restores visibility that is often lost as part of network virtualization.” You can read more about this in our newsroom here.

The "Black Box" in Multicloud Deployments

Hybrid- and multi-cloud designs are a very strong trend among the majority of organizations today. It comes with a great list of benefits, with the greatest one being agility. Though these deployments come with many benefits, a crucial item that is far too often overlooked from an active service assurance perspective is that the resources are no longer in your own local network, creating virtually a black box with it running remotely in somebody else's datacenter, and over a network that you do not own or control.

In a classic network you would have a collection of switches and routers implementing the network, physical devices that are under your complete control. You typically would monitor them using SNMP, Syslog, ICMP Ping, Flow data etc. This gives some level of insight into if the network is actually forwarding packets and if interfaces are congested or are having CRC errors. When a device breaks you can typically pick that up using these tools. But how do you identify when an error occurs outside of your network? And before end users are affected? Remember the GCP outage mentioned above?

Connection to cloud environments may be done in a number of ways, commonly including running over plain Internet, either natively or as VPNs (IPSEC, SD-WAN), or buying a leased line directly to the cloud provider. When running over the Internet it's important to remember that the Internet is always a best effort service, but also a leased line may have degradations that go unnoticed by passive monitoring tools.

So, what's the best way to do service assurance of a network that you do not fully control? By using it, that's really the only way to know if it works end to end. Luckily Netrounds has the tools to automatically test and monitor the network by placing active (synthetic) traffic on the data plane providing end to end visibility, including in a hybrid/multi cloud environment. See more about multi cloud deployments in our webinar, “A New Level of Visibility Across Multi Cloud Environments and Service Chains.”

Netrounds provides a cloud ready Virtual Machine that can be deployed at all regions and availability zones used in the clouds and in the corresponding on-prem platforms - there are even bare metal options for sites without any compute resources.

The next step is that each Netrounds Test Agent is configured to automatically monitor and test the network, either by sending traffic to a remote Netrounds Test Agent or by consuming a service from a content server - so consuming the network just as a regular user does (and with only a small footprint).

When the Netrounds solution detects a service degradation an alarm may be triggered and troubleshooting can be started right away. If you’re not conducting ongoing testing and monitoring the network by using it with active tools you are instead waiting for your users to find and report the fault - and this often turns into a blame game of if it’s a network issue or application issue.

How Service Assurance (Does Not) Work Today

Survey after survey tells us that between 40 – 75% of customers/ end-uses are the ones detecting and reporting network issues. I think we can all agree, that that is a lot. In addition to that shocking number, surveys also show organizations have experienced:

  • Lost revenue
  • Damaged reputation
  • Productivity loss
  • Majority of services are not tested
  • Majority of issues are not faults

So, without proper understanding of the end-user quality, the consequences are frustration and inefficiency for employees, increased support escalations and potentially churn, if you are a service provider. This is the situation today, and those are numbers accumulated without the number of increased complex deployments including multi cloud and service chains.

By adopting an active testing platform such as Netrounds:

  • Engineering teams will be able to launch tests on demand and troubleshoot problems mush faster and reduce MTTR.
  • Operational teams will be able to understand the end to end service chain and confirm that all important customers can reach their critical business applications in the cloud.
  • Service Account Managers will have access to activation test certificated to prove an SD-WAN, uCPE or IP-VPN service has been tuned up (or changed) properly.
  • Customer Care will have better instrumentation of your networks so that high valued customers and Account Managers understand if a service is delivered.

Download “The SLA Exposé: Are Users Right in Complaining About Their Networks?” to learn more.

In part 2 of this blog, we will look at how easy it is to actually deploy Netrounds Test Agents in cloud platforms. Stay tuned.