
Apache Airflow: The Hands-On Guide
Apache Airflow is an open-source platform to programmatically author, schedule and monitor workflows. If you have many ETL(s) to manage, Airflow is a must-have.
In the Ultimate Hands-On Course to Master Apache Airflow, you are going to learn everything you need in order to fully master this very powerful tool and take it to the next level. Every aspects of Airflow are addressed such as mastering your DAGs, monitoring, scalability, security and much more.
Materials (required for the course)
You will find the materials under the title of the video Development Environment in Udemy as shown below:

Curriculum
Section 1: Course Introduction
- Important Prerequisites
- Course Structure
- Who I am
- Development Environment
Section 2: The basics of Apache Airflow
- Introduction
- Why Airflow?
- What is Airflow?
- How Airflow works?
- [Practice] Installing Airflow
- [Practice] Quick Tour of Airflow UI
- [Practice] Quick Tour of Airflow CLI
- [Practice] Controlling your DAGs with the CLI
Section 3: The Forex Data Pipeline
- Introduction
- Docker reminder
- Troubleshoot Docker performances on MacOS
- Project: The Forex Data Pipeline
- What is a DAG?
- Defining your first DAG
- What is an Operator?
- [Practice] Checking if the API is available – HttpSensor
- [Practice] Checking if the currency file is available – FileSensor
- [Practice] Downloading the forex rates from the API – PythonOperator
- [Practice] Saving the forex rates in the HDFS – BashOperator
- [Practice] Creating the Hive table forex_rates – HiveOperator
- [Practice] Processing the forex rates with Spark – SparkSubmitOperator
- [Practice] Sending an email notification – EmailOperator
- [Practice] Sending a Slack notification – SlackAPIPostOperator
- Operator Relationships and Bitshift Composition
- [Practice] Adding dependencies between tasks
- [Practice] The Forex Data Pipeline in action!
Section 4: Mastering your DAGs
- Introduction
- Start_date and schedule_interval parameters demystified
- [Practice] Manipulating the start_date with schedule_interval
- Backfill and Catchup
- [Practice] Catching up non triggered DAGRuns
- Dealing with timezones in Airflow
- [Practice] Making your DAGs timezone aware
- How to make your tasks dependent
- [Practice] Creating task dependencies between DagRuns
- How to structure your DAG folder
- [Practice] Organizing your DAGs folder
- [Practice] How the Web Server works
- How to deal with failures in your DAGs
- [Practice] Retry and Alerting
- How to test your DAGs
- [Practice] Unit testing your DAGs
Section 5: Distributing Apache Airflow
- Introduction
- Sequential Executor with SQLite
- Local Executor with PostgreSQL
- [Practice] Executing tasks in parallel with the Local Executor
- [Practice] Ad Hoc Queries with the metadata database
- Scale out Apache Airflow with Celery Executors and Redis
- [Practice] Set up the Airflow cluster with Celery Executors and Docker
- [Practice] Distributing your tasks with the Celery Executor
- [Practice] Adding new worker nodes with the Celery Executor
- [Practice] Sending tasks to a specific worker with Queues
- [Practice] Pools and priority_weights: Limiting parallelism – prioritizing tasks
- Kubernetes Reminder
- Scaling Airflow with Kubernetes Executors
- [Practice] Set up a 3 nodes Kubernetes Cluster with Vagrant and Rancher
- [Practice] Installing Airflow with Rancher and the Kubernetes Executor
- [Practice] Running your DAGs with the Kubernetes Executor
Section 6: Improving your DAGs with advanced concepts
- Introduction
- Minimising Repetitive Patterns With SubDAGs
- [Practice] Grouping your tasks with SubDAGs and Deadlocks
- Making different paths in your DAGs with Branching
- [Practice] Make Your First Conditional Task Using Branching
- Trigger rules for your tasks
- [Practice] Changing how your tasks are triggered
- Avoid hard coding values with Variables, Macros and Templates
- [Practice] Templating your tasks
- How to share data between your tasks with XCOMs
- [Practice] Sharing (big?) data with XCOMs
- TriggerDagRunOperator or when your DAG controls another DAG
- [Practice] Trigger a DAG from another DAG
- Dependencies between your DAGs with the ExternalTaskSensor
- [Practice] Make your DAGs dependent with the ExternalTaskSensor
Section 7: Deploying Airflow on AWS EKS with Kubernetes Executors and Rancher
- Introduction
- Quick overview of AWS EKS
- [Practice] Set up an EC2 instance for Rancher
- [Practice] Create an IAM User with permissions
- [Practice] Create an ECR repository
- [Practice] Create an EKS cluster with Rancher
- How to access your applications from the outside
- [Practice] Deploy Nginx Ingress with Catalogs (Helm)
- [Practice] Deploy and run Airflow with the Kubernetes Executor on EKS
- [Practice] Cleaning your AWS services
Section 8: Monitoring Apache Airflow
- Introduction
- How the logging system works in Airflow
- [Practice] Setting up custom logging
- [Practice] Storing your logs in AWS S3
- Elasticsearch Reminder
- [Practice] Configuring Airflow with Elasticsearch
- [Practice] Monitoring your DAGs with Elasticsearch
- Introduction to metrics
- [Practice] Monitoring Airflow with TIG stack
- [Practice] Triggering alerts for Airflow with Grafana
- Airflow maintenance DAGs
Section 9: Security in Apache Airflow
- Introduction
- [Practice] Encrypting sensitive data with Fernet
- [Practice] Rotating the Fernet Key
- [Practice] Hiding variables
- [Practice] Password authentication and filter by owner
- [Practice] RBAC UI