Wouldn’t be convenient to be able to run Apache Airflow locally with the Kubernetes Executor on a multi-node Kubernetes cluster? That’s could be a great way to test your DAGs and understand how Airflow works in a Kubernetes environment isn’t it? Well that’s exactly what we are going to do here. I will show you step by step, how to quickly set up your own development environment and start running Airflow locally on Kubernetes. If you want to learn more about Airflow, don’t forget to check my course: Apache Airflow: The Complete Hands-On Introduction. Let’s get started!
Prerequisites
As we are going to create a multi-node Kubernetes cluster and interact with it, there are some tools to install in first. Let’s discover them.
The first one is KinD. KinD means Kubernetes IN Docker and allows to run local multi-node kubernetes clusters using Docker container “nodes”. Unlike with MiniKube, KinD has a significantly faster startup speed since it doesn’t rely on virtual machines. Take a look at the quick start guide to install it. Obviously, Docker should be installed as well on your machine.
The second tool to install is Kubectl. If you are familiar with Kubernetes, you should already know Kubectl. Kubectl is the official Kubernetes command-line tool and allows you to run commands against Kubernetes clusters. Whenever you want to deploy applications, manage cluster resources, or view logs, you will use Kubectl. Check the documentation to install it.
The last tool we need is Helm. Helm is the package manager for Kubenetes in order to install and manage Kubernetes applications in a very easy way. Helm relies on helm charts. A chart is a collection of files describing a set of Kubernetes resources. For example, the chart of Airflow will deploy a web server, the scheduler, the metastore, a service to access the UI and so on. Take a look at the Airflow chart here to have a better idea of what a chart is. Installing Helm is pretty straightforward as you can see here.
Now tools are installed, let’s create the Kubernetes cluster to run Apache Airflow locally with the Kubernetes Executor.
Running Apache Airflow locally on Kubernetes
To give you a better hands-on experience, I made the following video where I show you how to set up everything you need to get Airflow running locally on a multi-node Kubernetes cluster. In this video, you will learn:
- Configuring and creating a multi-node Kubernetes cluster with KinD
- Installing and upgrading the Helm chart of Airflow.
- Building your Docker image of Airflow packaged with your DAGs
- Creating a local registry to push your Docker image of Airflow
- Configuring Airflow to execute your tasks with the Kubernetes Executor.
That’s a lot of amazing stuff to learn! At the end, you will have Airflow running with the Kubernetes Executor in a local multi-node Kubernetes cluster. That way, you will be able to test and execute your DAGs in a Kubernetes environment without having to use expensive cloud providers. Enjoy!