Variables in Apache Airflow: The Guide

How to deal with variables in Apache Airflow? In this tutorial, you are going to learn everything you need about variables in Airflow. What are they, how do they work, how can you define them, how to get them, and more. If you followed my course “Apache Airflow: The Hands-On Guide”, variables should not sound unfamiliar to you as we quickly manipulated them in a lesson. This time, I’m going to give you all I know about variables so that in the end, you will be ready to use them in your DAGs. Without further waiting, let’s get started!

As you may already know, one of the greatest benefits of using Airflow is the ability to create dynamic DAGs. An example of a dynamic data pipeline could be the creation of N tasks based on a changing list of filenames. These N tasks will be instantiated based on those filenames (The filenames already exist). Notice that you could do the same with databases. Now the question is, where would this list of filenames be created to fetch? Did hard code in the DAG? Hell no! In a variable? Hmm, seems to be a better idea! 

Let me give you another example. Let’s say you have settings that your DAGs need to configure the KubernetesPodOperator. This operator expects many parameters such as resources limits for CPU and memory utilization, ports, volumes, and so on. Again, instead of hard coding these different values, you could define a variable with a JSON dictionary describing these settings.

You can either get or set variables from your DAGs but also from the UI or the Command Line Interface. Notice that it is also possible to use variables in Jinja templates and make your DAGs truly dynamic. 

Bottom line: Variables are useful for storing and retrieving data at runtime while avoiding hard-coding values and duplicating code in your DAGs.

Let’s discover how variables work in Apache Airflow.

How variables work?

Apache Airflow needs three components to work. The web server, the scheduler, and the metadata database. Let’s focus on the metadata database. This database can be any SQL databases compatible with SQLAlchemy such as Postgres, MySQL, SQLite, etc. After initialising Airflow, many tables populated with default data are created. One of the tables is “variable” where Airflow stores the variables. By looking more closely at the table, here is what we get:

                                       Table "public.variable"
Column    |          Type          | Collation | Nullable |               Default
id           | integer                |           | not null | nextval('variable_id_seq'::regclass)
key          | character varying(250) |           |          |
val          | text                   |           |          |
is_encrypted | boolean                |           |          |
"variable_pkey" PRIMARY KEY, btree (id)
"variable_key_key" UNIQUE CONSTRAINT, btree (key)

The four columns are.

  • id: incrementing numeric value that will be automatically assigned to your variable
  • key: literal string used to retrieve your variable in the table. Must be UNIQUE.
  • val: literal string corresponding to the value of your variable.
  • is_encrypted: boolean indicating if the value of the variable is encrypted or not. As long as the FERNET_KEY parameter is set, your variables will be encrypted by default. If you don’t know what I’m talking about, check my course, where I show you how it works.

Once you create a variable in Airflow, here is what you get:

 id |     key      |                                                 val
| is_encrypted
1 | my_first_var | gAAAAABeoWewkbpjOJmhgaWx73VpCHPI858rq4e9kawYGxzrJNpgSM63mIouJsNaM15TRtqX4NNih-MiKSed9468ZLLCygwdfA== | t
(1 row)

As you can see, the variable has the key my_first_var, with id 1, since it is the first variable in the table and a very strange string as a value. That’s because Airflow encrypts variable values by default.  Don’t worry I will come back at it in a minute.

All right, so we know, where variables are stored, now what’s the catch?

Best practices with variables in Airflow

Something you should absolutely know about Airflow is how the scheduler works. The scheduler is the Airflow masterpiece. Understanding its mechanism will help you to avoid some gotchas that can drastically reduce your Airflow instance’s performance. One of the most common mistakes I see in DAGs is code outside of task definitions.

The scheduler parses all the DAGs at a specific interval of time defined by the parameter min_file_process_interval in airflow.cfg. This parameter is set to 30 seconds by default. That means, the Scheduler parses your DAGs every 30 seconds. Since variables create a connection to the meta database every time they fetch a value, if you either set or get a variable outside of tasks, you may end up with a lot of open connections. Having many DAGs with many variables outside of tasks can be a big waste of resources and decrease performance.

Bottom line: Don’t write any code outside of tasks.

How to set a variable in Airflow?

There are three ways of defining variables in Apache Airflow. The most intuitive way is through the User Interface. 

User Interface
airflow variables

Nothing much to say, just go to Admin -> Variables -> Click on the + and you will land on this beautiful page.

Command line interface

The second way of creating variables is by using the command line interface.

variable cli

You can perform CRUD operations on variables with the command airflow variables. The command below allows you to set a variable my_second_var with the value my_value.

airflow variables set my_second_var my_value

We can export the variables in a JSON file

airflow variables export my_variables.json

Then, if we open the file my_variables.json we get:

"my_first_var": "my_first_value",
"my_second_var": "my_value"

The last way of defining variables is in the code. Indeed, you can create variables directly from your DAGs with the following code snippet:

from airflow.models import Variable
my_var = Variable.set("my_key", "my_value")

Remember, don’t put any get/set of variables outside of tasks.

All right, now you know how to define variables, let’s discover how to get them.

How to get a variable in Airflow?

The ways of setting variables in Airflow are the same ways of getting them. One very important thing to keep in mind is where Airflow looks first for your variables. Let me show you this little schema below:


As you can see, there are two “components” that we haven’t : backend backend secrets and airflow environment variables. Don’t worry, I will come back to them in the tutorial. Just keep in mind that Airflow goes through 2 “layers” before reaching the meta database. If the variable is found in one of these two layers, Airflow doesn’t need to create a connection to the database, so it is better optimized. That being said, let’s discover different ways of getting a variable in Airflow.

User Interface

Well that’s pretty simple, just go to the UI, Admin, and Variables, then you will get access to your variables as shown below

Command Line Interface

In order to get a variable through the command line interface, execute the following command:

airflow variables get key_of_the_variable

and you will get the decrypted value of the variable.


In your DAGs, there are two ways of getting your variables. Either by using the class “Variable” as shown below:

from airflow.models import Variable
my_var = Variable.get("my_key")

Or, by leveraging Jinja if you are trying to fetch a variable from a template: 

 {{ var.value.<variable_key> }}

Now you might say, “What the hell is Jinja and a template?”, well you’re lucky because I made a tutorial about it right here 🙂

All right, at this point, you just learned the basics of dealing with variables in Airflow and a little more. In the next sections, we are going to dive a little deeper and discover other ways of using them.

How to hide the value of a variable?

One thing you may want is to hide the values of your sensitive variables from the UI. Indeed, if you store AWS keys or passwords, it would be better to prevent from reading them by anyone. Hiding your variables in Airflow is pretty easy. You just need to add one of the following strings in the key of your variable:

For example, if I have a variable with the key: aws_secret_access_key, Airflow hides the value on the UI:

As you can see, the value is hidden.

How to mix variables and templates in DAGs?

If you want to unleash Airflow’s full power and flexibility, you have to understand how the Jinja template engine works and what you can do by mixing templates with variables and macros. You will be able to modify and insert data to your DAGs at runtime, and act according to these values. 

For example, let’s say you have a DAG fetching credit card movements from clients to processing them. Instead of having this kind of DAG:


You could have:


Where the task “fetching_clients” fetches the list of clients and then for every client, a task with their name is created. Keep in mind that the list of clients is already predefined in the variable. Why it’s better? In the first DAG, if the processing fails for one client, you have to retry the task for all of your clients and find who is the client in error. Whereas, in DAG 2, you are able to quickly identify the client and only retry the corresponding task. Better optimized, more robust, and faster results.

After this quick introduction, if you want to learn how to deal with templates, variables, and macros in Airflow, check out my tutorial right here.

Optimizing variables with the JSON format

If you have multiple values with a possible hierarchy that you would like to store in a variable, like configuration settings, it would be more suitable to store this data in a friendly format. Well, Airflow allows you to set and get variables in JSON format. For example, let’s say we have the following JSON data:

"login": "my_login",
"password": "my_password",
"config": {
"role": "admin"

If you store it in a variable named “settings”. 

From your DAG, you could either get this JSON data with:

from airflow.models import Variable
settings = Variable.get("settings", deserialize_json=True)
# And be able to access the values like in a dictionary
print settings['login']
print settings['config']['role']

Or if you are in a template:

 {{ var.json.<variable_key> }}

When you have multiple values that can be logically regrouped, I strongly encourage you to store them in a variable in JSON Format. By doing this, you will avoid having to make multiple requests to the metadata database as only one will be enough to get everything you need. Fewer connections is better.

Storing variables in environment variables

If you remember the schema showing the different layers at which Airflow tries to fetch variables, there is one layer before the metastore called “AIRFLOW_ENV”.

Since Apache Airflow 1.10.10, it is possible to store and fetch variables from environment variables just by using a special naming convention. Any environment variable prefixed by AIRFLOW_VAR_<KEY_OF_THE_VAR> will be taken into account by Airflow. 

Concretely, in your bash session, you could execute the following commands:

export AIRFLOW_VAR_AWS_ACCESS_KEY_ID="wejfhwfhwwner"
# Or in JSON
export AIRFLOW_VAR_SETTINGS='{"login":"marc", "password": "my_pass", "config": { "role": "admin" }}'

To create two variables AWS_ACCESS_KEY_ID and SETTINGS. Fetching them from your DAGs is done exactly like with any other variables.

If you want to see that in action, I made a special video just below:

How to get environment variables from your DAGs?

What about if you would like to fetch an environment variable which isn’t prefixed by AIRFLOW_VAR. Well, in your DAG you could the following thing:

import os
dag = DAG(...)
def print_env_var():
print_context = PythonOperator(

Nothing specific to Airflow here, it’s what you would do in any Python code. Here, we print the value of the environment variable CASSANDRA_PASSWORD (which is not really smart btw ;p ) using the Python module os.


All right, a pretty dense tutorial isn’t it? I hope you really enjoyed what you’ve learned. Airflow is a really powerful orchestrator with many features to discover. If you want to discover Airflow, go check my course The Complete Hands-On Introduction to Apache Airflow right here. Or if you already know Airflow and want to go way further, enroll in my 12 hours course here.

You may be noticed that I didn’t talk about the Backend Secret yet. Well, that’s for another video/tutorial very soon. If you want to stay in touch, fill out the form below.

Have a great day! 🙂 

Interested by learning more? Stay tuned and get special promotions!

7 thoughts on “Variables in Apache Airflow: The Guide”

  1. Thanks for a lot for the info on airflow variables.These airflow variables can be referred multiple time in different dags .But if the scenario is to update variable these variable multiple times as during parallel run and them it would not be useful in that case. Can you suggest how to handle such situations

  2. Hi Marc,
    Thank you for this topic, it helps me so much; specialy for how to get and set variable airflow for this moment.

  3. Hello Marc, is there a way we can use one airflow variable while defining another?
    eg. first variable {“key”: “api_key”, “value”: “xyz”}
    & second variable {“key”: “connection_string”, “value”: “[api_key]”
    something like that..
    Does my question make sense

  4. Pranay Choudhary

    Hey Marc,
    Creating tasks dynamically at runtime is not possible in airflow as per my knowledge. SO the way you have suggested of implementing the alternate dag doesn’t seem possible. (Where the task “fetching_clients” will fetch the list of clients and then, for each client, a task with their name will be created.) Please correct me if I am wrong
    and if available can you please share the code, for how to implement it.

Leave a Comment

Your email address will not be published.