Cost efficiency is claimed to be one of the main benefits of a cloud infrastructure. I bet, some people would say “What? Have you seen the AWS data transfer cost?”
In this article we’ll tell about how the AWS traffic cost took us by surprise on one of our recent projects, why it happened and what the nature of the unexpected charges was. You’ll also learn how the Apiko DevOps team solved this issue, and get useful tips for AWS cost optimization.
Why the AWS bill exceeded our expectations
When designing the underlying cloud solution for the Kubernetes Cluster, you pay a lot of attention to the infrastructure configuration so it suits the app requirements the best. It’s when one often forgets about the costs for the data transferred between their servers/services. Eventually, it results in a massive bill by the end of the month caused by data traffic between different Availability Zones (AZs) or regions.
That’s what we’ve faced in one of our projects after migrating the database to a self-hosted solution. Besides forgetting about the AWS data charges, some of the concepts in AWS and the resulting traffic costs were challenging to foresee. A daily cost of about 12USD for nearly 1,2TB of data traffic hit us unexpectedly.
The origin of AWS traffic costs
As data may be shared between different regions, AZs, architecture components, understanding all the types of data traffic and the respective costs applied can be frustrating. Here’s a somewhat generalized summary of the main traffic costs, and an AWS architecture diagram to illustrate them:
The main types of AWS paid data transfers include
- Cross-AZ and cross-region traffic
- Outbound traffic from the Kubernetes cluster to the internet
- Elastic load balancing, and other.
You can find more pricing details in the article about AWS data transfer cost for common architectures.
The hidden “cost generator” on our project
In our case, there are 6 nodes in the Elastic Kubernetes Service, and 4 database servers spread across 3 AZs. There is no multi-region infrastructure, as the app will be used only within one country. However, the application generates a large amount of data for the database servers.
Considering the data transfer pricing mentioned above, for AWS cost optimization we need to make the most use of
- inner AZ connections between the servers
- private database server endpoints (private DNS hostnames and IPs).
What we didn’t foresee was that most of the time, the Kubernetes scheduler randomly created containers with the app (pods) across all of the nodes in different AZs. It’s a normal default behavior for the Kubernetes scheduler, unless one sets the affinity rules. This significantly increased cross-AZ traffic, resulting in the unexplainably high bill.
The Kubernetes scheduler is a control plane process which assigns Pods to Nodes. The scheduler determines which Nodes are valid placements for each Pod in the scheduling queue according to constraints and available resources. The scheduler then ranks each valid Node and binds the Pod to a suitable Node.
Solution to the problem
To get rid of the unnecessary data usage costs, we needed to come up with a pod scheduling pattern. It had to be designed in a way that would optimize our cost usage and wouldn’t affect application availability.
After some research, we found out that we could schedule our pods by using nodeAffinity and default EKS node labels, like topology.kubernetes.io/zone=us-east-1a, which represented node Availability Zone.
There are two modes of nodeAffinity:
- “Hard” affinity - requiredDuringSchedulingIgnoredDuringExecution - means that the scheduler will place the pod to the assigned node only if there is a specific label. If not - the pod will be set to a Pending state.
- “Soft” affinity - preferredDuringSchedulingIgnoredDuringExecution - if the scheduler can’t find a node with the specific label, it still schedules a pod.
If an emergency occurs in one AZ, the application will be available because the scheduler can place it on other nodes. In our case, soft affinity is a perfect choice. Let’s test it locally!
- Spin up a minikube with three nodes: one master and two worker nodes:
$ minikube start --nodes 3 --driver docker
- Let’s mark the minikube-m02 node with the topology.kubernetes.io/zone=us-east-1a and minikube-m02 with topology.kubernetes.io/zone=us-east-1b labels.
$ kubectl label nodes minikube-m02 topology.kubernetes.io/zone=us-east-1a
- Check the result:
$ kubectl describe nodes minikube-m02
$ kubectl describe nodes minikube-m03
- Create simple deployment with hard nodeAffinity first with us-east-1c zone, and check pod status:
$ kubectl apply -f hard_affinity_deployment.yml
Scheduler can’t place the pod because there is no node with the us-east-1c label.
- Now, let’s take a look at what happens if we use the same selector but with soft affinity:
$ kubectl apply -f soft_affinity_deployment.yml
As we can see, the scheduler places a pod on another available node, and that’s what we’ve tried to achieve.
AWS cost optimization results
After the optimization, we’ve got a huge price drop and one more happy customer! Here are some cost explorer graphs representing the before and after results of switching to the inner AZ traffic.
Before: About 12 USD and approximately 1.2 TB of data per day on average for inter AZ traffic.
After: Having implemented the nodeAffinity, we got about 75GB of data traffic costing around 0.75 USD daily:
Tips for AWS data transfer cost optimization
Here are the main takeaways on how to avoid extra AWS data charges.
- Try to keep your servers in one region
- If you need high availability - try to find a region in which incoming traffic will be the cheapest
- Choose inner AZ traffic between your key services or applications which generate a lot of traffic
- Avoid using public endpoints, such as public IPs or DNS names, when you can use private connections
Hope this article prevents you from unexpected expenses. If you have any questions or would like to discuss your project in particular, feel free to reach out!TechHub