Apache Airflow | How to use the BashOperator

Wondering how can you execute bash commands through Airflow ? The Airflow BashOperator does exactly what you are looking for. It is a very simple but powerful operator, allowing you to execute either a bash script, a command or a set of commands from your DAGs. You may have seen in my course “The Complete Hands-On Course to Master Apache Airflow” that I use this operator extensively in different use cases. Indeed, mastering this operator is a must-have and that’s what we gonna learn in this post. One more thing, if you like my tutorials, you can support my work by becoming my Patron right here. No obligation but if you want to help me, I will thank you a lot.

Let’s start by looking at the following very simple DAG

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.bash_operator import BashOperator

from datetime import datetime, timedelta

with DAG(dag_id='bash_dag', schedule_interval=None, start_date=datetime(2020, 1, 1), catchup=False) as dag:
    
    # Task 1
    dummy_task = DummyOperator(task_id='dummy_task')
    
    # Task 2
    bash_task = BashOperator(task_id='bash_task', bash_command="echo 'command executed from BashOperator'")

    dummy_task >> bash_task

The DAG “bash_dag” is composed of two tasks:

  • The task called “dummy_task” which basically does nothing.
  • The task “bash_task” which executes a bash command as shown from the parameter bash_command.

In order to know if the BashOperator executes the bash command as expected, the message “command executed from BashOperator” will be printed out to the standard output. Copy and paste the DAG into a file bash_dag.py and add it to the folder “dags” of Airflow. Next, start the webserver and the scheduler and go to the Airflow UI. From there, you should have the following screen:

airflow_bashoperator

Now, trigger the DAG by clicking on the toggle next to the DAG’s name and let the DAGRun to finish. Once it’s done, click on the Graph Icon as shown by the red arrow:

From the Graph View, we can visualise the tasks composing the DAG and how they depend to each other. Click on the task “bash_task” and “View Log”.

airflow_bashoperator_3

Finally, if we take a look at the bottom of the logs, we can see that the message “command executed from BashOperator” has been printed as expected. Meaning, the function has been well executed through the BashOperator.

airflow_bashoperator_4

From the screenshot above there are two things to notice. First, just above the output, the command executed by the BashOperator is also printed in the logs. This can be very useful for debugging if you have a set of commands to execute for example. Also, the return code of the command is given. Any value different than 0 means that an error occurred. Alright, we have seen how to execute a simple command from the BashOperator, let’s move forward.

Getting results from the BashOperator with XCOMs

One question you may ask yourself is how do I get the result of the executed command? Well, that’s pretty simple actually. Let’s use the same DAG but this time with some slight modifications.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.bash_operator import BashOperator

from datetime import datetime, timedelta

with DAG(dag_id='bash_dag', schedule_interval="@once", start_date=datetime(2020, 1, 1), catchup=False) as dag:
    
    # Task 1
    dummy_task = DummyOperator(task_id='dummy_task')
    
    # Task 2
    bash_task = BashOperator(task_id='bash_task', bash_command="whoami", xcom_push=True)

    dummy_task >> bash_task

If you carefully take a look at the BashOperator, we use a new parameter named “xcom_push” set to True. This parameter may be deprecated, so if you got an error use “do_xcom_push” instead. So, “xcom_push” allows you to push the last line written to stdout into a XCom. If you don’t know what is a XCom I strongly encourage you to take a look at my course.

By setting the parameter “xcom_push” to True, we are allowing the BashOperator to push the result of the executed bash command into a XCom. Notice that the command here, is “woahmi” in order to print the current bash user. If you trigger the DAG and take a look at the logs of the task bash_task, you will get the following result.

airflow_bashoperator_5

As you can see, the user executing the command is “airflow” in my case. Notice that this username may not be different for you. Ok great but what about the XCom? Well, if you click on “Admin”, then “XComs” you will obtain the following row:

As you can see, the value “airflow” corresponding to the Bash user has been stored into the metadatabase of Airflow with the key “return_value”. The key “return_value” indicates that this XCom has been created by return the value from the operator. Alright, let me show you one more thing. Let’s say, this time you would like to have the return code of the executed bash command. How can you do it? Well, let’s take a look at the DAG below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.bash_operator import BashOperator

from datetime import datetime, timedelta

with DAG(dag_id='bash_dag', schedule_interval="@once", start_date=datetime(2020, 1, 1), catchup=False) as dag:
    
    # Task 1
    dummy_task = DummyOperator(task_id='dummy_task')
    
    # Task 2
    bash_task = BashOperator(task_id='bash_task', bash_command="whoami; echo $?", xcom_push=True)

    dummy_task >> bash_task

This time, instead of executing one bash command we execute two commands: “whoami” and  “echo $?” divided by a semi colon. “echo $?” prints on the standard output the return code of the last executed command “whoami”. If we take a look at the logs of the task “bash_task”: 

airflow_bashoperator_6

Pay attention that this time, we got two ouputs, “airflow” and “0” on two different lines since we executed two different commands. Now if you take a look at the XComs view, you obtain the following row:

airflow_bashoperator_7

The value changed from “airflow” to “0” which is the returned code of the command “whoami”.

Did you learn something? Become my Patron and get more high quality tutorials

Executing bash scripts with the BashOperator

So far in this tutorial, we have seen how to execute commands in the BashOperator by passing them directly to the bash_command parameter. Now, what about if we want to execute a bash script? Let’s discover this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.bash_operator import BashOperator

from datetime import datetime, timedelta

with DAG(dag_id='bash_dag', schedule_interval="@once", start_date=datetime(2020, 1, 1), catchup=False) as dag:
    
    # Task 1
    dummy_task = DummyOperator(task_id='dummy_task')
    
    # Task 2
    bash_task = BashOperator(task_id='bash_task', bash_command="/usr/local/airflow/dags/command.sh ", xcom_push=True)

    dummy_task >> bash_task
1
2
3
4
#!/bin/bash

echo 'commands from a script'
pwd

The first code snippet corresponds to the DAG we are going to trigger and the second one is the script “command.sh” that will be executed with the BashOperator. There are some very important points to keep in mind here. First, the bash script file must be executable. Meaning, you have to apply the command “chmod +x” on it, otherwise, you will get an access denied. Then, in the BashOperator we specified the absolute path of the file commands.sh that will be called. Pay attention to the little “space” added at the end of the path. This is not a mistake. You have to add a trailing space if you want to execute a bash script with the BashOperator. Why? Because Airflow will try to apply a Jinja template to it, which will fail. If you don’t know what is templating, take a look at the following tutorial I made. Finally, if you execute this DAG, you will get the following output:

airflow_bashoperator_7

Notice that even if we run a bash script, we still get the outputs corresponding to the executed commands in it.

Jinja Templates with the BashOperator in Airflow

The last thing I would like to show you about the BashOperator is how can you mix it with Jinja templates. Let’s do this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.bash_operator import BashOperator

from datetime import datetime, timedelta

with DAG(dag_id='bash_dag', schedule_interval="@once", start_date=datetime(2020, 1, 1), catchup=False) as dag:
    
    # Task 1
    dummy_task = DummyOperator(task_id='dummy_task')
    
    # Task 2
    commands = """
    mkdir -p /usr/local/airflow/dags/{{ ds }};
    cp /usr/local/airflow/dags/command.sh /usr/local/airflow/dags/{{ ds }};
    sh /usr/local/airflow/dags/{{ ds }}/command.sh;
    """

    bash_task = BashOperator(task_id='bash_task', bash_command=commands)

    dummy_task >> bash_task

The example above shows you multiple features:

  • The set of commands to execute is store into the variable “commands” which is given to the bash_command parameter.
  • This set of commands uses Jinja templates. Indeed, the curly brackets {{ }} represent placeholders are the value in it will be replaced at run time. Here, {{ ds }} is a macro that will be replaced by the execution date of the DAG.
  • Each time the DAG gets triggered, three commands will be executed. First, a folder named with the current execution date will be created in the folder dags of Airflow. Next, the bash script command.sh will be copied from the dags folder into the new created folder with the execution date. Finally, the bash script is run.

One interesting feature of Aiflow is that you can check your Jinja Templates rendered before executing your DAG. Meaning, you can see by what the placeholders of your DAG will be replaced. If you click on the DAG bash_dag, “Graph View”, “bash_task” and click on “Rendered”, you will get the following output: 

airflow_bashoperator_8

From the output above, we obtain the three commands rendered with the macro “ds” replaced by the current execution date. In my case it is “2020-02-25” but this can be different for you. This feature is extremely useful to check in advance if you didn’t make any mistake.

Did you learn something? Become my Patron and get more high quality tutorials

Conclusion

In this tutorial, we started by executing a very simple bash command with the BashOperator. Then, we discovered how can we get the result of that command to push it into a XCOM and execute multiple commands at once. Next, we learned how to run a bash script file from the BashOperator as well as using Jinja templates so that we can add value at run time. Perfect, you learned everything you need to use the BashOperator in your DAGs. Well done! If you want to learn more about Airflow, check my courses here.

Interested by learning more? Stay tuned and get special promotions!

Liked it? Join the Patreon Community and get an access to exclusive content now!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top