airflow_aws

Apache Airflow on AWS EKS: The Hands-On Guide

Apache Airflow is an open-sourceĀ  platform to programmatically author, schedule and monitor workflows. If you have many ETL(s) to manage, Airflow is a must-have.

In the Apache Airflow on AWS EKS: The Hands-On Guide course, you are going to learn everything you need to set up a production ready architecture on AWS EKS with Airflow and the Kubernetes Executor. Discover how to execute tasks at scale like you will do in your company.

Materials (required for the course)

You will find the materials directly in a video of the course

Curriculum

Section 1: Introduction

  1. Important Prerequisites
  2. Who I am
  3. Your Airflow Journey
  4. Overview of the architecture
  5. The Checklist

Section 2: Configuring AWS

  1. Defining a budget
  2. [Practice] Creating the IAM admin group
  3. [Practice] Create the IAM admin user

Section 3: Exploring the DevOps world

  1. Why is knowing DevOps concepts important?
  2. Reminder about Kubernetes
  3. Kubernetes Quiz
  4. What is IaC or Infrastructure as code?
  5. IaC Quiz
  6. Deployments with GitOps
  7. GitOps made simple with Flux
  8. GitOps Quiz

Section 4: Creating the EKS cluster with GitOps

  1. [Practice] Creating the cloud9 environment for the workstation
  2. [Practice] Configuring the workstation
  3. [Practice] Configuring Cloud9 with the Admin account
  4. [Practice] Creating the IAM role to interact with the EKS cluster
  5. AZs, VPCs and Subnets in AWS
  6. What is AWS EKS?
  7. [Practice] Creating and configuring the Git repository for GitOps
  8. [Practice] Creating a multi-node EKS cluster with EKSCTL and GitOps
  9. [Practice] Configuring the EKS cluster with Flux
  10. Namespaces in Kubernetes
  11. [Practice] Creating dev, staging and prod namespaces
  12. Clean Up

Section 5: Deploying Airflow with DAGs

  1. Set Up
  2. Deployments with Helm
  3. [Practice] Overview of the Airflow Helm chart
  4. Scaling with the Kubernetes Executor
  5. [Practice] Creating your first release of Airflow
  6. [Practice] Deploying Airflow with Flux
  7. Troubleshooting deployments with Flux
  8. Synchronizing DAGs in Kubernetes
  9. [Practice] Fetching DAGs with Git-Sync
  10. [Practice] Running DAGs with Git-Sync
  11. Secrets in Kubernetes
  12. [Practice] Fetching DAGs with Git-Sync from a private repository
  13. [Practice] Adding the secret in the repo
  14. Volumes in Kubernetes
  15. Introduction to AWS EFS
  16. [Practice] Configuring AWS EFS
  17. [Practice] Sharing DAGs between pods with AWS EFS
  18. Clean Up

Section 6: Building CI/CD pipelines to deploy Airflow

  1. Set Up
  2. What is AWS CodePipeline?
  3. [Practice] Building a CI/CD pipeline with CodePipeline and ECR
  4. [Practice] Deploying Airflow in EKS with CodePipeline and Flux
  5. Unit testing in Airflow
  6. [Practice] Unit testing your DAGs
  7. [Practice] Building the CI/CD pipeline in dev with unit tests
  8. [Practice] Integration tests for testing tasks in DAGs
  9. [Practice] Building the CI/CD pipeline in staging with integration tests
  10. [Practice] Clean up

Section 7: Exposing the Airflow UI

  1. [Practice] Set up
  2. Services in Kubernetes
  3. Architecture with the Elastic Load Balancer
  4. [Practice] Exposing the Airflow UI with AWS Elastic Load Balancer
  5. What is an Ingress?
  6. Architecture with the AWS ALB Ingress controller
  7. [Practice] Exposing the Airflow UI with AWS ALB Ingress
  8. [Practice] Exposing the staging environment with AWS ALB
  9. Quick reminder about SSL
  10. [Practice] Creating a Domain for Airflow with ExternalDNS and AWS Route53
  11. [Practice] Activating SSL on the Airflow UI
  12. [Practice] Fix the AWS ALB’s health checks
  13. [Practice] Exporting the SSL secret object
  14. [Practice] Upgrading the staging environment
  15. [Exercise] Enabling DNS and SSL for staging
  16. [Practice] Creating subdomains to access the UIs of Airflow
  17. Clean Up

Section 8: Logging with Airflow in AWS EKS

  1. Set Up
  2. RBAC in Kubernetes
  3. Permission issues for accessing pod’s logs
  4. [Practice] Storing logs in AWS EFS
  5. [Practice] Remote logging with AWS S3
  6. Limitations of remote logging in AWS S3
  7. Remote logging with AWS CloudWatch
  8. Sensitive data with Secret Backends
  9. [Practice] Managing connections with AWS Secret Manager
  10. [Creating] Storing the secret object of AWS Secret Manager for Flux
  11. Clean Up

Section 9: Configuring the production environment

  1. Set up
  2. [Practice] Creating the production environment
  3. Identifying single point of failures
  4. [Practice] Making the Airflow UI highly available
  5. AWS Relational Database Service
  6. [Practice] Airflow with AWS RDS
  7. DAG Serialization
  8. [Practice] Making the web server stateless with DAG Serialization
  9. Clean Up
  10. Congratulations!