Jed Rembold
August 6, 2025
Getting Airflow metrics into Prometheus requires a bit of a middle step
Airflow has built in ways of exporting metrics to a system called StatsD
We can effectively turn this on, and then set up a statsd-exporter that makes metrics available in a Prometheus format
Easiest way to turn on is to set 3 environment variables in
Airflow’s docker-compose.yml:
AIRFLOW__METRICS__STATSD_ON: 'true'
AIRFLOW__METRICS__STATSD_PORT: 9125
AIRFLOW__METRICS__STATSD_HOST: 'example.advde'The StatsD Exporter benefits from a cheat-sheet for how it can translate from statsd to Prometheus names
statsd_mapping.yml in same location as
prometheus.ymlThen we can set up the exporter in Docker Compose
services:
statsd-exporter:
image: prom/statsd-exporter
ports:
- 9102:9102 # Where to access metrics
- 9125:9125 # Incoming metrics
- 9125:9125/udp
volumes:
- ./statsd_mapping.yml:/tmp/statsd_mapping.yml
command:
- '--statsd.mapping-config=/tmp/statsd_mapping.yml'There is nothing magical about the new Prometheus config:
...
scrape_configs:
...
- job_name: 'airflow'
static_configs:
- targets: ['example.advde:9102']
As per usual, we can deploy both from Docker Compose
services:
loki:
image: grafana/loki:latest
ports:
- "3100:3100"
volumes:
- ./loki-config.yaml:/etc/loki/loki-config.yaml
promtail:
image: grafana/promtail:latest
volumes:
- ./promtail-config.yaml:/etc/promtail/promtail-config.yaml
- /var/log:/var/log
- /other_log_sources:/mnt/other_log_sourcesYou shouldn’t need to alter the default Loki config at all. The defaults direct from the website are:
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
log_level: debug
grpc_server_max_concurrent_streams: 1000
common:
instance_addr: 127.0.0.1
path_prefix: /tmp/loki
storage:
filesystem:
chunks_directory: /tmp/loki/chunks
rules_directory: /tmp/loki/rules
replication_factor: 1
ringc:
kvstore:
store: inmemory
query_range:
results_cache:
cache:
embedded_cache:
enabled: true
max_size_mb: 100
limits_config:
metric_aggregation_enabled: true
schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
pattern_ingestor:
enabled: true
metric_aggregation:
loki_address: localhost: 3100
ruler:
alertmanager_url: http://localhost:9093
frontend:
encoding: protobufYou would probably mostly want to add new scrape jobs to the default Promtail config
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
url: http://loki:3100/api/v1/push
scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: varlogs
__path__: /var/log/*logassets, we
reinforced how we could explicitly follow thisSQLExecuteQueryOperator, and
PostgresOperator are two main onesMarquez prefers you use some helper scripts to launch its services (which uses Docker Compose under the hood)
Need to checkout the Git repo and then enter the repo:
git clone https://github.com/MarquezProject/marquez && cd marquezStart the services (both the server, web UI, and database for storage) with:
./docker/up.sh --db-port 2345
--db-port portion to ensure the
new database doesn’t conflict with your warehouseCan access the web UI then at
localhost:3000
It is actually trivially easy to set up Airflow to talk to Marquez
Set two environmental variables in your Airflow Docker Compose
...
AIRFLOW__OPENLINEAGE__TRANSPORT: '{"type": "http", "url": "http://localhost:5000", "endpoint": "api/v1/lineage"}'
AIRFLOW__OPENLINEAGE__NAMESPACE='my-team-airflow-instance'
...Newer Airflow Docker images already ship with the necessary providers
Restart your Airflow stack and they are connected!