ultimate course

Apache Airflow: The Hands-On Guide

Apache Airflow is an open-source  platform to programmatically author, schedule and monitor workflows. If you have many ETL(s) to manage, Airflow is a must-have.

In the Ultimate Hands-On Course to Master Apache Airflow, you are going to learn everything you need in order to fully master this very powerful tool and take it to the next level. Every aspects of Airflow are addressed such as mastering your DAGs, monitoring, scalability, security and much more.

Materials (required for the course)

You will find the materials under the title of the video Development Environment in Udemy as shown below:

Curriculum

Section 1: Course Introduction

  1. Important Prerequisites
  2. Course Structure
  3. Who I am
  4. Development Environment

Section 2: The basics of Apache Airflow

  1. Introduction
  2. Why Airflow?
  3. What is Airflow?
  4. How Airflow works?
  5. [Practice] Installing Airflow
  6. [Practice] Quick Tour of Airflow UI
  7. [Practice] Quick Tour of Airflow CLI
  8. [Practice] Controlling your DAGs with the CLI

Section 3: The Forex Data Pipeline

  1. Introduction
  2. Docker reminder
  3. Troubleshoot Docker performances on MacOS
  4. Project: The Forex Data Pipeline
  5. What is a DAG?
  6. Defining your first DAG
  7. What is an Operator?
  8. [Practice] Checking if the API is available – HttpSensor
  9. [Practice] Checking if the currency file is available – FileSensor
  10. [Practice] Downloading the forex rates from the API – PythonOperator
  11. [Practice] Saving the forex rates in the HDFS – BashOperator
  12. [Practice] Creating the Hive table forex_rates – HiveOperator
  13. [Practice] Processing the forex rates with Spark – SparkSubmitOperator
  14. [Practice] Sending an email notification – EmailOperator
  15. [Practice] Sending a Slack notification – SlackAPIPostOperator
  16. Operator Relationships and Bitshift Composition
  17. [Practice] Adding dependencies between tasks
  18. [Practice] The Forex Data Pipeline in action!

Section 4: Mastering your DAGs

  1. Introduction
  2. Start_date and schedule_interval parameters demystified
  3. [Practice] Manipulating the start_date with schedule_interval
  4. Backfill and Catchup
  5. [Practice] Catching up non triggered DAGRuns
  6. Dealing with timezones in Airflow
  7. [Practice] Making your DAGs timezone aware
  8. How to make your tasks dependent
  9. [Practice] Creating task dependencies between DagRuns
  10. How to structure your DAG folder
  11. [Practice] Organizing your DAGs folder
  12. [Practice] How the Web Server works
  13. How to deal with failures in your DAGs
  14. [Practice] Retry and Alerting
  15. How to test your DAGs
  16. [Practice] Unit testing your DAGs

Section 5: Distributing Apache Airflow

  1. Introduction
  2. Sequential Executor with SQLite
  3. Local Executor with PostgreSQL
  4. [Practice] Executing tasks in parallel with the Local Executor
  5. [Practice] Ad Hoc Queries with the metadata database
  6. Scale out Apache Airflow with Celery Executors and Redis
  7. [Practice] Set up the Airflow cluster with Celery Executors and Docker
  8. [Practice] Distributing your tasks with the Celery Executor
  9. [Practice] Adding new worker nodes with the Celery Executor
  10. [Practice] Sending tasks to a specific worker with Queues
  11. [Practice] Pools and priority_weights: Limiting parallelism – prioritizing tasks
  12. Kubernetes Reminder
  13. Scaling Airflow with Kubernetes Executors
  14. [Practice] Set up a 3 nodes Kubernetes Cluster with Vagrant and Rancher
  15. [Practice] Installing Airflow with Rancher and the Kubernetes Executor
  16. [Practice] Running your DAGs with the Kubernetes Executor

Section 6: Improving your DAGs with advanced concepts

  1. Introduction
  2. Minimising Repetitive Patterns With SubDAGs
  3. [Practice] Grouping your tasks with SubDAGs and Deadlocks
  4. Making different paths in your DAGs with Branching
  5. [Practice] Make Your First Conditional Task Using Branching
  6. Trigger rules for your tasks
  7. [Practice] Changing how your tasks are triggered
  8. Avoid hard coding values with Variables, Macros and Templates
  9. [Practice] Templating your tasks
  10. How to share data between your tasks with XCOMs
  11. [Practice] Sharing (big?) data with XCOMs
  12. TriggerDagRunOperator or when your DAG controls another DAG
  13. [Practice] Trigger a DAG from another DAG
  14. Dependencies between your DAGs with the ExternalTaskSensor
  15. [Practice] Make your DAGs dependent with the ExternalTaskSensor

Section 7: Deploying Airflow on AWS EKS with Kubernetes Executors and Rancher

  1. Introduction
  2. Quick overview of AWS EKS
  3. [Practice] Set up an EC2 instance for Rancher
  4. [Practice] Create an IAM User with permissions
  5. [Practice] Create an ECR repository
  6. [Practice] Create an EKS cluster with Rancher
  7. How to access your applications from the outside
  8. [Practice] Deploy Nginx Ingress with Catalogs (Helm)
  9. [Practice] Deploy and run Airflow with the Kubernetes Executor on EKS
  10. [Practice] Cleaning your AWS services

Section 8: Monitoring Apache Airflow

  1. Introduction
  2. How the logging system works in Airflow
  3. [Practice] Setting up custom logging
  4. [Practice] Storing your logs in AWS S3
  5. Elasticsearch Reminder
  6. [Practice] Configuring Airflow with Elasticsearch
  7. [Practice] Monitoring your DAGs with Elasticsearch
  8. Introduction to metrics
  9. [Practice] Monitoring Airflow with TIG stack
  10. [Practice] Triggering alerts for Airflow with Grafana
  11. Airflow maintenance DAGs

Section 9: Security in Apache Airflow

  1. Introduction
  2. [Practice] Encrypting sensitive data with Fernet
  3. [Practice] Rotating the Fernet Key
  4. [Practice] Hiding variables
  5. [Practice] Password authentication and filter by owner
  6. [Practice] RBAC UI