Sep 29, 2022

AWS Data Transfer Cost Optimization for Elastic Kubernetes Service: Case Study

Cost efficiency is claimed to be one of the main benefits of cloud infrastructure. I bet some people would say “What? Have you seen the AWS data transfer cost?”

In this article, we’ll tell about how the AWS traffic cost took us by surprise on one of our recent projects, why it happened, and what the nature of the unexpected charges was. You’ll also learn how the Apiko DevOps team solved this issue, and get useful tips for AWS cost optimization.

Why the AWS bill exceeded our expectations

When designing the underlying cloud solution for the Kubernetes Cluster, you pay a lot of attention to the infrastructure configuration so it suits the app requirements the best. It’s when one often forgets about the costs for the data transferred between their servers/services. Eventually, it results in a massive bill by the end of the month caused by data traffic between different Availability Zones (AZs) or regions.

That’s what we’ve faced in one of our projects after migrating the database to a self-hosted solution. Besides forgetting about the AWS data charges, some of the concepts in AWS and the resulting traffic costs were challenging to foresee. A daily cost of about 12 USD for nearly 1,2TB of data traffic hit us unexpectedly.

The origin of AWS traffic costs

As data may be shared between different regions, AZs, architecture components, understanding all the types of data traffic and the respective costs applied can be frustrating. Here’s a somewhat generalized summary of the main traffic costs, and an AWS architecture diagram to illustrate them:

AWS architecture diagram

The main types of AWS paid data transfers include

Cross-AZ and cross-region traffic
Outbound traffic from the Kubernetes cluster to the internet
Elastic load balancing, and other.

You can find more pricing details in the article about AWS data transfer cost for common architectures.

The hidden “cost generator” on our project

In our case, there are 6 nodes in the Elastic Kubernetes Service, and 4 database servers spread across 3 AZs. There is no multi-region infrastructure, as the app will be used only within one country. However, the application generates a large amount of data for the database servers.

Considering the data transfer pricing mentioned above, for AWS cost optimization we need to make the most use of

inner AZ connections between the servers
private database server endpoints (private DNS hostnames and IPs).

What we didn’t foresee was that most of the time, the Kubernetes scheduler randomly created containers with the app (pods) across all of the nodes in different AZs. It’s a normal default behavior for the Kubernetes scheduler, unless one sets the affinity rules. This significantly increased cross-AZ traffic, resulting in the unexplainably high bill.

The Kubernetes scheduler is a control plane process which assigns Pods to Nodes. The scheduler determines which Nodes are valid placements for each Pod in the scheduling queue according to constraints and available resources. The scheduler then ranks each valid Node and binds the Pod to a suitable Node.

Source: kubernetes.io

Solution to the problem

To get rid of the unnecessary data usage costs, we needed to come up with a pod scheduling pattern. It had to be designed in a way that would optimize our cost usage and wouldn’t affect application availability.

After some research, we found out that we could schedule our pods by using nodeAffinity and default EKS node labels, like topology.kubernetes.io/zone=us-east-1a, which represented node Availability Zone.

There are two modes of nodeAffinity:

“Hard” affinity - requiredDuringSchedulingIgnoredDuringExecution - means that the scheduler will place the pod to the assigned node only if there is a specific label. If not - the pod will be set to a Pending state.
“Soft” affinity - preferredDuringSchedulingIgnoredDuringExecution - if the scheduler can’t find a node with the specific label, it still schedules a pod.

If an emergency occurs in one AZ, the application will be available because the scheduler can place it on other nodes. In our case, soft affinity is a perfect choice. Let’s test it locally!

Spin up a minikube with three nodes: one master and two worker nodes:

$ minikube start --nodes 3 --driver docker

Let’s mark the minikube-m02 node with the topology.kubernetes.io/zone=us-east-1a and minikube-m02 with topology.kubernetes.io/zone=us-east-1b labels.

$ kubectl label nodes minikube-m02 topology.kubernetes.io/zone=us-east-1a
node/minikube-m02 labeled

$ kubectl label nodes minikube-m03 topology.kubernetes.io/zone=us-east-1b
node/minikube-m03 labeled

Check the result:

$ kubectl describe nodes minikube-m02
Name:               minikube-m02
Roles:              <none>
Labels:             beta.kubernetes.io/arch=arm64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=arm64
                    kubernetes.io/hostname=minikube-m02
                    kubernetes.io/os=linux
                    topology.kubernetes.io/zone=us-east-1a

$ kubectl describe nodes minikube-m03
Name:               minikube-m03
Roles:              <none>
Labels:             beta.kubernetes.io/arch=arm64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=arm64
                    kubernetes.io/hostname=minikube-m03
                    kubernetes.io/os=linux
                  topology.kubernetes.io/zone=us-east-1b

Create simple deployment with hard nodeAffinity first with us-east-1c zone, and check pod status:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment-hard-affinity
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: topology.kubernetes.io/zone
                    operator: In
                    values:
                      - us-east-1c

$ kubectl apply -f hard_affinity_deployment.yml
deployment.apps/nginx-deployment created

$ kubectl get pods
NAME                                READY   STATUS    RESTARTS   AGE
nginx-deployment-7648977687-pzjq4   0/1     Pending   0          24s

Scheduler can’t place the pod because there is no node with the us-east-1c label.

Now, let’s take a look at what happens if we use the same selector but with soft affinity:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 10
            preference:
              matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - us-east-1c

$ kubectl apply -f soft_affinity_deployment.yml
deployment.apps/nginx-deployment-soft-affinity created
$ kubectl get pods
NAME                                              READY   STATUS    RESTARTS   AGE
nginx-deployment-soft-affinity-764ff57948-f58zv   1/1     Running   0          31s

As we can see, the scheduler places a pod on another available node, and that’s what we’ve tried to achieve.

AWS cost optimization results

After the optimization, we’ve got a huge price drop and one more happy customer! Here are some cost explorer graphs representing the before and after results of switching to the inner AZ traffic.

Before: About 12 USD and approximately 1.2 TB of data per day on average for inter AZ traffic.

AWS data transfer cost

After: Having implemented the nodeAffinity, we got about 75GB of data traffic costing around 0.75 USD daily:

AWS cost optimization

Tips for AWS data transfer cost optimization

Here are the main takeaways on how to avoid extra AWS data charges.

Try to keep your servers in one region
If you need high availability - try to find a region in which incoming traffic will be the cheapest
Choose inner AZ traffic between your key services or applications which generate a lot of traffic
Avoid using public endpoints, such as public IPs or DNS names, when you can use private connections

Hope this article prevents you from unexpected expenses. If you have any questions or would like to discuss your project in particular, feel free to reach out!