Jed Rembold
July 30, 2025
To get started, in our config we will want to deactivate
confluent_kafka’s default automatic
committing
This ensures that we are purely in control of what offset(s) we begin reading events from
config = {
'bootstrap.servers': |||your server|||,
'group.id': |||your group|||,
'enable.auto.commit': False
}con = Consumer(config)
metadata = con.list_topics(topic=|||your topic|||, timeout=10)
partitions = metadata.topics[|||your topic|||].partitions.offsets_for_times
TopicPartition
object (or list of them)
example = con.offsets_for_times(
TopicPartition(|||your topic|||, |||chosen partition|||, |||kafka timestamp|||),
timeout = 10)Don’t forget that Kafka uses timestamps in a Unix format, but in milliseconds
If you have a pendulum DateTime
object then, you need to convert to a timestamp and then multiply by
1000
normal_time = pendulum.now()
kafta_time = int(normal_time.timestamp() * 1000)DateTime objects usually track to the
microsecond, so you’ll have some decimals left over that you should
truncate with int
offsets_for_times returns to you a
single (or list of)
TopicPartition(s)
In some cases, that is exactly what you want
In other cases, you may want to just extract the offset(s) from
the TopicPartition
Doing so is simple:
desired_offset = example.offsetYou can of course loop over a list of
TopicPartitions to extract all the
offsets
TopicPartition to the consumer
TopicPartition or
it might be a list of TopicPartitions|||consumer|||.assign(|||your single or list of topic partitions|||)poll loop looks very
similaroffsets_for_times to generate
TopicPartition(s) for both the start and end
times
TopicPartition. The starts are already good
to go.UNION step is
important

services:
prometheus:
image: prom/prometheus
ports:
- 9090:9090
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./prom_data:/prometheus
user: 1000:1000
Because Prometheus pulls data to it, it needs a configuration file to tell it what it should be pulling and how often
This should be in the same folder as your
docker-compose.yml file (at least as set up
there)
A highly basic configuration might initially look like:
global:
scrape_interval: 15s #How frequently to scrape
evaluation_interval: 15s # for rules/alerts
scrape_configs:
- job_name: 'prometheus' # Name of the scraping job
static_configs:
- targets: ['localhost:9090'] # API endpoint services:
node-exporter:
image: quay.io/prometheus/node-exporter
command:
- '--path.rootfs=/host'
network_mode: host
pid: host
restart: unless-stopped
volumes:
- '/:host:ro,rslave'All launching the exporter container does is make the metrics available, we still need to tell Prometheus to grab these new values
Under scrape_configs, just need to
add a field:
...
scrape_configs:
...
- job_name: 'vm_node'
static_configs:
- targets: ['example.advde:9100']1860 and Load!What if we want more details about specifically what is happening in our database?
There exists a Postgres exporter!
services:
postgres-exporter:
image: prometheuscommunity/postgres-exporter
environment:
DATA_SOURCE_NAME: "postgresql://user:pass@host:port/dbname?sslmode=disable"
ports:
- 9187:9187Don’t forget to update the Prometheus config!
...
scrape_configs:
...
- job_name: 'postgres'
static_configs:
- targets: ['example.advde:9187']Getting Airflow metrics into Prometheus requires a bit of a middle step
Airflow has built in ways of exporting metrics to a system called StatsD
We can effectively turn this on, and then set up a statsd-exporter that makes metrics available in a Prometheus format
Easiest way to turn on is to set 3 environment variables in
Airflow’s docker-compose.yml:
AIRFLOW__METRICS__STATSD_ON: 'true'
AIRFLOW__METRICS__STATSD_PORT: 9125
AIRFLOW__METRICS__STATSD_HOST: 'example.advde'The StatsD Exporter benefits from a cheat-sheet for how it can translate from statsd to Prometheus names
statsd_mapping.yml in same location as
prometheus.ymlThen we can set up the exporter in Docker Compose
services:
statsd-exporter:
image: prom/statsd-exporter
ports:
- 9102:9102 # Where to access metrics
- 9125:9125 # Incoming metrics
- 9125:9125/udp
volumes:
- ./statsd_mapping.yml:/tmp/statsd_mapping.yml
command:
- '--statsd.mapping-config=/tmp/statsd_mapping.yml'There is nothing magical about the new Prometheus config:
...
scrape_configs:
...
- job_name: 'airflow'
static_configs:
- targets: ['example.advde:9102']