By Jonas Krogell, Technical Product Manager, Netrounds
In part one we discussed how Netrounds restores end-to-end network visibility, which is often lost as part of network virtualization, and how classical assurance systems are unable to identify degradation in end to end service chains and prevent them by ensuring all elements of the chain perform well.
In the second half of this blog, below, we will discuss:
- How to deploy Netrounds Test Agents in cloud platforms
- What platform and performance level to choose
- How to automate Test Agents deployments with cloud-init
- How to easily set up three Test Agents to initiate some intra-region measurements
The big three cloud providers are all covered from our side, and Netrounds Test Agents can be downloaded directly from the AWS Marketplace and via our app for Google Cloud and Microsoft Azure. Let's look at how a typical hybrid cloud design could look, using Amazon Web Services (AWS) as an example.
The above diagram is based on the AWS reference design (https://docs.aws.amazon.com/vpc/latest/adminguide/Introduction.html) for how to build VPNs in order to reach AWS VPCs. In this design, one Test Agent in an EC2 instance would be deployed in each VPC subnet and in each availability zone, and the other Test Agents in the remote customer locations that may be running on a different platform (VMware, Cisco, KVM), or may even consist of a small physical Test Agent, would then be configured to continuously test and monitor the connectivity inside the VPN connection to each availability zone. If the Test Agent tests and monitors find a service degradation fault in the connectivity, it's also very likely that users at the same office location are also experiencing issues reaching their cloud services.
Platforms and Performance
In AWS you can deploy Test Agents on most of the modern instance types, ranging from the smallest and cheapest t2/t3 series all the way up to the Nitro charged network optimized based c5n.18xlarge (100GE!) instance type. However, a good middle ground is the c5.large/xlarge which provides one or two dedicated CPUs at a low monthly cost. These instances are still able to run many Gbit/s of test traffic when needed.
If you are running Google Cloud Platform (GCP), then an n1-standard-1/2 is a perfect starting point that provides similar performance as an AWS c5.large/xlarge. Good to note is that GCP enforces bandwidth limits a bit differently than AWS, limiting egress networking to 2 Gbit/s per vCPU and capping it at a total of 16 Gbit/s per compute instance. In Microsoft Azure the general-purpose B-family is typically a good choice, for example a Standard_B2s.
Test Agents store no measurements on their local storage, rather they are all sent northbound to the Netrounds Control Center. This means that the Test Agent only needs a minimal disk image containing the code to boot and perform its tasks; the size of this image is 2 GB (~200 MB when compressed).
When picking a Test Agent instance type there are two parameters to keep in mind: what are the latency accuracy requirements, and what maximal throughput is needed? Avoiding overbooked burstable instances is usually a good idea to keep latency measurements as accurate as possible − otherwise there is a certain risk that other VMs sharing the same hardware could impact the latency and latency variation (jitter) values reported by the Test Agent. The throughput a Test Agent can produce when doing speed tests such as single- or multisession TCP or RFC 6349 typically correlates with the number of assigned CPU cores; the more CPU cores the instance has, the higher network throughputs are possible.
For example, when we deployed two Test Agent instances on c5n.18xlarge types in AWS and ran a multisession TCP test, this is what we saw:
A throughput of 92 Gbit/s is very impressive between two virtual machines in a cloud environment using a single network interface!
Automating Deployments Using Cloud-init
Netrounds is a strong supporter of network automation: it's important for us that it’s easy and quick to deploy Test Agents, both locally on-premise and in clouds. The Test Agents can receive their basic configuration through a method called cloud-init; this is all they need to “call home” to their Control Center and receive further instructions. No manual provisioning of the Test Agents is required, and the full deployment process can be handled by scripts and tools. To illustrate how simple this concept is, let’s walk through a manual cloud-init deployment using AWS.
Select the AMI you want to launch:
Select the appropriate instance type:
Configure appropriate VPC details, then enter the cloud-init data under “Advanced Details”:
These details instruct the Test Agent which Control Center to connect to, what email and password to authenticate with, and what NTP server to use for time synchronization. The NTP server specified, 169.254.169.123, is an AWS-specific NTP that is available within the availability zone. GCP offers a similar service with hostname metadata.google.internal, and in Azure you have the option to use public NTP servers such as time.windows.com or time.google.com. Picking a good NTP server is important in order to get as accurate latency measurements as possible. After this step you may configure security groups and launch the VM.
Naturally the same could be done using the CLI to integrate into scripts. This is how my example above would be done from the CLI:
Normally, the Test Agent will deploy and boot in less than a minute, and it will then connect to the Netrounds Control Center and register automatically. Any additional configuration is done through the Netrounds Control Center GUI or through the APIs as normal.
It’s worth noting that in AWS, the EC2 instance normally gets assigned a private IP address. If you want to use the cloud Test Agent as the server over the public Internet you need to adjust the security groups to allow incoming requests, and also change the interface to “Use public IP” so that the Netrounds client will know to contact its public IP address and not the private one.
As a bonus, AWS has native support for IPv6, so if your VPC has it configured you can assign an IPv6 address to the instance and configure the eth0 interface for DHCPv6, and voilà:
As an experiment in AWS we could quickly and easily spin up three Test Agents, one in each availability zone in the new Stockholm region (eu-north-1) and as a baseline look for details on latency, jitter and loss.
All looks good from this perspective, with one-way latency values at around 0.5 ms in each direction. Light traveling in a fiber for 0.5 milliseconds corresponds to a physical distance of 100 kilometers, which aligns approximately with the geographical distance between the three zones around Stockholm. It may even be possible to pinpoint which zone is located where by applying some math to the numbers − but I'll leave that as an exercise for the reader.
Drilling into the measurements we could look at high-resolution jitter:
We can see that the largest jitter over the past 15 minutes on this link in one direction is 0.05 milliseconds (50 microseconds). As another validation we might check how high throughput we can obtain in a single TCP session:
This 8904 Mbit/s goodput is the actual TCP payload performance. The figure is equivalent to almost 10 Gbit/s on the Ethernet layer, which is what AWS states as “up to” performance for a c5.large instance. Based on these tests we can conclude that the intra-region network in the eu-north-1 region is performing very well: low latency, extremely low jitter and high throughput. Excellent, now I can feel confident that any services I deploy in the same zones will also perform nicely from a network point of view between the zones − and since I can leave these running 24/7 as monitors I can get automatic notifications if any KPIs are degraded. Obviously, I could also do the same between different regions and also to my own data centers or office locations. Nice.
Hope you found this blog post useful! We look forward to hearing about your ideas for using Netrounds Test Agents in public clouds in order to conduct active service assurance end-to-end on the data plane from an end-user perspective.
Watch our webinar "A New Level of Visibility Across Multicloud Environments and Service Chains" for a deeper look at active service assurance covering end-to-end visibility of the data plane.