Password Authentication in Apache Airflow

When you start using Apache Airflow in production, one of your top priority is to prevent its access to everyone. Indeed, since Airflow orchestrates data pipelines, it is a master piece of your data platform and can potentially deal with sensitive data. When we are working in a company, different teams may want to access Airflow and so, you have to know who is allowed or not. In this tutorial, you are going to discover how to set up password authentication in Apache Airflow which will be your first major security in place. Then, you will see how to filter DAGs by owner so that only the right teams access to the DAGs they belong to. By the way, if you are interesting by mastering Apache Airflow, check by course right here. Alright, without further waiting, let’s get started. 

Authentication backends in Apache Airflow

Apache Airflow brings some authentication backends that you can use to filter the access of the user interface. Here is the exhaustive list:

  • Github Enterprise
  • Google Auth
  • Kerberos
  • LDAP
  • Password

If you want to learn more about their implementation you can take a look at the following link. For simplicity, we are going to start with the Password authentication in Apache Airflow. Don’t worry I will show you how to set up Kerberos and LDAP authentications as well in future tutorials. If you want to stay in touch, don’t forget to let share your email address, so that I can reach you.

Password Authentication in Apache Airflow

Alright, in the video below taken from my course The Ultimate Hands-On Course to Master Apache Airflow, I show you step by step how to set up the password authentication in Apache Airflow. Also, you will discover how to filter your DAGs by owner so that only the user can see the DAGs he/she belongs to. 

Conclusion

Password authentication is the simplest way to force users to specify a password before logging in. Notice that the python module bcrypt should be installed as well as the package “password” along the install of Apache Airflow. There is no way to create a user from the user interface of Airflow so you have to use the code snippet shown from the video to generate one. In my opinion, this mechanism can be nice at the very beginning but if you start using Airflow at scale in your company, you should think about setting up RBAC  (Role-Based Access Control) with Airflow. It is well integrated and perfect to narrow permissions of each user specifically to their needs. If you want to learn more about it, you can check my course where I show you how to do it. 

If you like my tutorials and want to support my work, click here and become my Patron. ( The number of Patrons is limited )

I hope you enjoyed this tutorial and see you for the next one 🙂

Interested by learning more? Stay tuned and get special promotions!

3 thoughts on “Password Authentication in Apache Airflow”

  1. Hi Marc,
    I am trying to configure authentication on an airflow installation orchestrated by docker-compose.
    I added all the environment variables related to authentication (authenticate, auth_backend, filter_by_owner) inside entrypoint.sh as well.
    After restarting just the webserver, I am getting the login page. But when I enter the username and password on the page, it gives me an error saying “Incorrect login details”.
    I looked at the logs of webserver but there is no help from there.
    I am not sure where the username and password fed by the python script got stored. Is it that the webserver UI refers to some other DB and the user name and password got stored somewhere else?
    can you please help?

  2. Hi Marc,
    Thanks for the wonderful video on authentication. I was trying to implement it in a docker-compose environment with CeleryExecutor. The problem I am facing is even if I add the users through python script successfully, when I try to login, it gives me an error saying “Invalid login. Try again”. I am not sure if the environment is correctly setup. Just to make sure, when we add the user, does it get stored inside the postgres DB? or is there a separate DB for just the webserver?
    Your help would be great to resolve this issue.
    Thanks in advance.

Leave a Comment

Your email address will not be published.