Data pipeline architecture: airflow triggered by message broker

Let us say we have:

  • a web app with a Postgres DB that produces data over time,
  • another DB optimized for analytics that we would like to populate over time.

My goal is to build and monitor an ETL pipeline that will transform the data and write it to the analytics DB. (Preferably using a streaming approach.)

My idea was to:

  • have the web app produce a JSON representation of the relevant data, write it to a message broker of some kind (say RabbitMQ).
  • use Airflow to structure and monitor the ETL process

The part I am confused about is:

  1. How to have Airflow subscribe to the message broker in a way that a given pipeline is triggered every time a message is received?
  2. (If the above is possible), does this approach make sense? (In particular using Airflow in a stream-like approach, rather than in a scheduled, batch approach.)
  3. (If 1 is not possible) I have a few follow-up questions below.

Follow-up questions if architecture above is not feasible:

  1. why is not possible?
  2. what would be an alternative way to construct a streaming approach that would allow some of form of monitoring/management of tasks out of the box?
  3. would it be a more standard approach to let Airflow schedule retrieval of data in small batches and what are the components of the trade-off between this approach and streaming approach?

Any form of feedback on these questions is much welcome.