Airflow Trigger Rules

By default, your tasks will only execute after all parent tasks have successfully completed. This is generally expected behavior. But what if you want something more complex? What if you would like to execute a task as soon as one of its upstream tasks succeeds? Or execute a different set of tasks if another fails? Mastering one concept is essential when addressing multiple use cases with your data pipelines: the Airflow Trigger Rules.

If you want to become an Apache Airflow expert and discover its unique features, check out my courses here.

Use Cases

As usual, let’s begin with some concrete use cases to understand better why this feature might be helpful for you.

Reacting when a task fails

What can you do when a task fails? Airflow offers different mechanisms, but a common one to react in case of failure is to use callbacks. It’s simple: give a function to the operator’s parameter on_failure_callback, and as soon as the task fails, that will call the function. It’s great, but there are some limitations:

  • The scheduler does not manage callbacks. If they fail, you cannot retry them or be warned.
  • Callbacks should run lightweight tasks. If you need something more complex, you need something else.

One way to address these downsides is to use Airflow Trigger Rules!

Solving the BranchPythonOperator pitfall

Let’s take a look at the following data pipeline:

branchpythonoperator pitfall

choose_model uses the BranchPythonOperator to choose between is_inaccurate and is_accurate and then execute store regardless of the selected task. However, you can see above that it didn’t happen that way. When a task is skipped, all its direct downstream tasks get skipped. What if you want to always execute store? Airflow trigger rules!

Airflow trigger rules: What are they?

 I think trigger rules are the easiest concept to understand in Airflow.

Basically, a trigger rule defines why a task runs – based on what conditions. By default, all tasks have the same trigger rule all_success, meaning if all upstream tasks of a task succeed, the task runs. Only one trigger rule can be specified

my_task = PythonOperator(
  task_id='my_task',
  trigger_rule='all_success'
)

There are many trigger rules. Let’s see each one of them.

The list of Airflow trigger rules

all_success

This one is straightforward, and you already use it. A task runs when all directly upstream tasks have succeeded.

all_success

However, if an upstream task is skipped, then the downstream task is skipped as well:

airflow trigger rules all_success

all_failed

Simple, a task runs if all direct upstream tasks have failed

airflow trigger rule all_failed

This airflow trigger rule is handy if you want to do some cleaning or something more complex that you can’t put within a callback. You can define a set of tasks to execute if some tasks fail.

Like with all_success, the task will also be skipped if one of the direct upstream tasks is skipped.

all_done

Trigger a task as soon as all direct upstream tasks are done, regardless of their states.

This trigger rule might be useful if there is a task that you always want to execute regardless of the upstream task’s state like cleaning some resources.

one_failed

As soon as one of the upstream tasks fails, your task runs.

airflow trigger rule one_failed

Useful if you have long-running tasks and want to do something as soon as one fails.

one_success

Like with one_failed, but the opposite. Your task runs as soon as one of the upstream tasks succeeds.

none_failed

Your task runs if all direct upstream tasks have succeeded or been skipped.

Only useful if you want to handle the skipped status.

none_skipped

A task runs if no direct upstream task is in a skipped state.

none_failed_min_one_success

Previously known as “none_failed_or_skipped” (before Airflow 2.2), with this trigger rule, your task runs if all direct upstream tasks haven’t failed and at least one succeeded.

This is the rule you must use to handle the BranchPythonOperator pitfall 😎

Conclusion

Airflow trigger rules are simple to use but yet powerful as they open to solving new use cases. Don’t hesitate to use them to handle errors more reliably than callbacks. Trigger rules are great for managing resources but there is a better way with setup and teardown tasks that you can find here.

Last, I did a LinkedIn post here with a one-page summary of the important trigger rules. You will see their successful behavior and their opposite.

That’s it! Enjoy ❤️

PS: If you want to get started with Airflow now, take a look at the course I made for you here

Leave a Reply

Your email address will not be published. Required fields are marked *