Case study

Azure Functions

Two Python functions decouple alert generation from Discord delivery through Azure Service Bus — Pydantic-validated events and 5-minute dedup windows cut alert noise by ~80% during incidents.

cloud · serverless · April 2026

  • Azure Functions v2.0
  • Python
  • Service Bus
  • Timer Trigger
  • Pydantic
  • aiohttp

Context

When Prometheus fires an alert, something has to turn that event into a message a human will actually read. Baking notification logic into Alertmanager templates means every notification change is a configmap edit and an ArgoCD sync. I wanted a decoupled serverless layer that consumes incident events from a queue, validates them, and routes them to Discord — deployable and scalable on its own.

Architecture

Two functions on the Azure Functions v2.0 Python runtime, two triggers. A health checker runs on a 15-minute timer, polls each service's health endpoint, and emits an incident event to Service Bus on failure. An incident processor subscribes to the incident-events topic, validates, deduplicates, and posts a Discord embed. Service Bus is the decoupling boundary: Alertmanager and the health checker produce, the processor is the sole consumer — adding Slack or PagerDuty later means writing a new consumer, not touching the producers.

Azure Functions incident pipeline: producers into Service Bus, processor out to Discord Health checker timer · every 15 min Alertmanager Prometheus alerts Azure Service Bus incident-events topic Incident processor Pydantic validation · 5-min dedup 3 retries · backoff Discord webhook incident embeds
Producers never know about Discord; the queue is the contract.

Key decisions & tradeoffs

  • Validate at the boundary. Every message passes a Pydantic model that enforces required fields (service, severity, timestamp, metric value) and dead-letters malformed payloads instead of dropping them silently. This caught a producer bug sending severity as "WARNING" instead of "warning".
  • Deduplicate on (service, alert-rule). A sustained degradation can fire the same alert every 30 seconds; the 5-minute window acknowledges duplicates without forwarding them. During a 2-hour Istio sidecar memory leak that meant ~80% less noise — and, since Service Bus bills per operation, a smaller bill too.
  • Async webhooks. The Discord POST runs on aiohttp so the function never blocks on the round-trip; failures retry three times with exponential backoff, then dead-letter for manual inspection.
  • Work with cold starts, not against them. Cold start adds 1–3 s after idle — fine for a 15-minute health check, painful on the alert path — so a heartbeat event every 5 minutes keeps the processor warm.
  • Pin your models. Pydantic v2's migration broke field aliases that worked in v1; the version is pinned in requirements.txt.

Measured outcomes

Functions deployed
2
timer + Service Bus trigger
Health checks
96/day
per service, 15-min interval
Alert noise cut
~80%
5-min dedup, during incidents
Webhook retries
3
exponential backoff, then DLQ