Skip to main content

Transactional Email

The DVARA Flightdeck pod is the only place that sends outbound email — every other pod publishes an EmailRequestedEvent and lets the Flightdeck-side EmailDeliveryListener render the template and dispatch it. This page covers the pieces an operator needs to reason about: picking a transport, what survives a transient failure, what fails fast, and where to look when something goes missing.

Producers

These flows publish emails (rendered + delivered by Flightdeck):

ProducerTemplateTrigger
Built-in auth — invitationsinvitation.htmlOwner creates a user via POST /users
Built-in auth — password resetpassword-reset.htmlUser submits /forgot-password
Built-in auth — email verificationverification.htmlFirst-run /setup and /register flows
SaaS-mode signupwelcome.html/signup (trial) and Stripe checkout.session.completed (paid)
Threshold notificationsthreshold-warn-80.html / threshold-soft-100.html / threshold-hard-110.htmlTenant crosses 80% / 100% / 110% of monthly token cap (three distinct templates, one per band)
Chronic-abuse suspensionchronic-suspension.html (plus chronic-founder.html for founder-tier escalation)Tenant suspended after consecutive months over cap

All producers go through the same publish path. A producer never blocks on transport — EmailRequestedEvent is published synchronously, the listener handles delivery on a separate thread.

Transports

Pick one transport with dvara.flightdeck.email.transport:

ValueBehaviorUse when
log (default)Logs a one-line summary + every CTA link on its own line at INFO; full HTML body at DEBUGDev, CI, demos, first-day soft launch
smtpDelivers via Spring's JavaMailSender using spring.mail.* propertiesSelf-managed deployment with corporate SMTP
resendPOSTs to resend.com's transactional APISaaS / managed deployments wanting deliverability + bounce handling

Every send under the log transport prints a two-line block to flightdeck's stdout: the summary first, then each actionable URL on its own link: … line. Grep for link::

docker compose logs dvara-flightdeck | grep -B1 'link:'

Sample output:

Email (log transport) | to=alice@example.com subject=You've been invited to DVARA htmlChars=4690
link: http://localhost:8090/register?token=ef726772-b312-4c40-b4a9-3b1ec7adb33f

Copy the URL into your browser to complete onboarding. Same pattern for password-reset (/reset-password?token=…) and email verification — anything with a ?token=… parameter is surfaced. Non-actionable links in the template (brand footer, docs, mailto: support) are deliberately not echoed.

The rendered HTML body is not logged at INFO — it's 4–5 KB of Thymeleaf per send and would dominate the log stream. If you need to inspect the full body, switch the LogTransport logger to DEBUG.

transport=resend — production checklist

  1. Sign up at resend.com and create an API key. Scope Full access (Flightdeck needs POST /emails and GET /domains).
  2. Verify your sender domain at resend.com/domains. Configure the DKIM + Return-Path records, wait for status to flip to verified. The sandbox sender onboarding@resend.dev skips verification but customer-facing copy reads from: onboarding@resend.dev — fine for soft launch, not for marketing-clean GA.
  3. Set:
    DVARA_FLIGHTDECK_EMAIL_TRANSPORT=resend
    DVARA_FLIGHTDECK_EMAIL_FROM=noreply@yourdomain.com
    DVARA_FLIGHTDECK_EMAIL_RESEND_API_KEY=re_…
  4. By default (dvara.flightdeck.email.resend.verify-domain-at-startup=true) Flightdeck calls GET /domains at boot and refuses to start on a production-class profile if the sender domain isn't verified at the Resend end. The sandbox sender skips this check. Disable on air-gapped / no-egress environments by setting resend.verify-domain-at-startup=false.

Common transport vars

PropertyEnv VarDefaultDescription
dvara.flightdeck.email.fromDVARA_FLIGHTDECK_EMAIL_FROMnoreply@dvarahq.comSender address on every outbound
dvara.flightdeck.email.transportDVARA_FLIGHTDECK_EMAIL_TRANSPORTloglog, smtp, or resend
dvara.flightdeck.email.public-endpoint-urlDVARA_FLIGHTDECK_EMAIL_PUBLIC_ENDPOINT_URLhttps://api.dvarahq.com/v1Data-plane URL shown in welcome + check-email pages
dvara.flightdeck.email.flightdeck-urlDVARA_FLIGHTDECK_EMAIL_FLIGHTDECK_URLhttps://flightdeck.dvarahq.comFlightdeck base URL for welcome + reset CTAs
dvara.flightdeck.email.docs-urlDVARA_FLIGHTDECK_EMAIL_DOCS_URLhttps://dvarahq.com/docsDocs link in welcome email
dvara.flightdeck.email.resend-api-keyDVARA_FLIGHTDECK_EMAIL_RESEND_API_KEY(empty)Required when transport=resend

Durability layer

Every send goes through the email_delivery_log PostgreSQL table, not directly out the transport. This buys four properties operators care about:

  1. Idempotency. A second publish of the same EmailRequestedEvent.idempotencyKey within the configured TTL is a no-op. A retried inbound webhook can't double-mail the customer.
  2. Retry with exponential backoff. Transport failures classified as transient (timeouts, 5xx, throttling) re-enter the queue. Permanent failures (4xx, render errors) DLQ immediately.
  3. Dead-letter queue. Exhausted retries land in email_delivery_log with state DEAD_LETTERED for operator review. Rows are retained for dlq-retention-days (default 30) and reaped by a nightly cron.
  4. Survive a pod restart. A send in flight when Flightdeck restarts is picked up by the retry sweeper on the next tick — no in-memory state to lose.

Set dvara.flightdeck.email.delivery.enabled=false to fall back to fire-and-forget (no DB row, no idempotency, no retry, no DLQ). Useful for tests that don't want a Postgres dep — not recommended for any production-class install.

Retry schedule

Default schedule gives 5 attempts with exponential backoff, total ~5m 30s before DLQ:

attempt 1 → 0s (synchronous, in the listener)
attempt 2 → +30s (initial-backoff-ms)
attempt 3 → +60s (initial × multiplier^1)
attempt 4 → +120s (initial × multiplier^2, capped at max-backoff-ms)
attempt 5 → +120s (cap holds)
→ DEAD_LETTERED

Tune via delivery.initial-backoff-ms, delivery.max-backoff-ms, delivery.backoff-multiplier, and delivery.max-attempts. The retry sweeper polls every retry-sweep-interval-ms (default 30s) and processes up to retry-sweep-batch-size (default 100) due rows per tick — both are throughput dials for very high mail volumes.

Delivery knobs

All ten ship with sensible defaults. Tune only when you have a specific reason — pinning Resend rate-limit pressure, matching SES SLA, aggressive vs gentle backoff.

PropertyEnv VarDefaultDescription
dvara.flightdeck.email.delivery.enabledDVARA_FLIGHTDECK_EMAIL_DELIVERY_ENABLEDtrueMaster switch for the durability layer
dvara.flightdeck.email.delivery.max-attemptsDVARA_FLIGHTDECK_EMAIL_DELIVERY_MAX_ATTEMPTS5Sync attempt 1 + 4 async retries before DLQ
dvara.flightdeck.email.delivery.initial-backoff-msDVARA_FLIGHTDECK_EMAIL_DELIVERY_INITIAL_BACKOFF_MS30000Backoff before attempt 2
dvara.flightdeck.email.delivery.max-backoff-msDVARA_FLIGHTDECK_EMAIL_DELIVERY_MAX_BACKOFF_MS120000Ceiling on any single retry's backoff
dvara.flightdeck.email.delivery.backoff-multiplierDVARA_FLIGHTDECK_EMAIL_DELIVERY_BACKOFF_MULTIPLIER2.0Exponential factor — delay(n) = min(initial × multiplier^(n-2), max) for n ≥ 2
dvara.flightdeck.email.delivery.retry-sweep-interval-msDVARA_FLIGHTDECK_EMAIL_DELIVERY_RETRY_SWEEP_INTERVAL_MS30000How often the retry sweeper polls
dvara.flightdeck.email.delivery.retry-sweep-batch-sizeDVARA_FLIGHTDECK_EMAIL_DELIVERY_RETRY_SWEEP_BATCH_SIZE100Max rows processed per sweeper tick
dvara.flightdeck.email.delivery.idempotency-ttl-minutesDVARA_FLIGHTDECK_EMAIL_DELIVERY_IDEMPOTENCY_TTL_MINUTES60Dedupe window — a second publish of the same idempotencyKey inside the window is a no-op
dvara.flightdeck.email.delivery.dlq-retention-daysDVARA_FLIGHTDECK_EMAIL_DELIVERY_DLQ_RETENTION_DAYS30How long SENT + DEAD_LETTERED rows are kept
dvara.flightdeck.email.delivery.cleanup-cronDVARA_FLIGHTDECK_EMAIL_DELIVERY_CLEANUP_CRON0 0 3 * * *Nightly DLQ + idempotency purge (default 03:00 UTC). PENDING_RETRY rows are never touched.

Idempotency-key collision caveat: the dedupe window is idempotency-ttl-minutes, but DLQ rows are retained for dlq-retention-days. A producer that reuses a deterministic UUID across the retention boundary will silently lose the audit row on PK collision. Producers reusing deterministic UUIDs across that boundary should regenerate.

Observability

Every send + retry + DLQ transition emits both an audit event and a Prometheus counter increment — no extra wiring.

Audit events

Event typeWhen
EMAIL_SENTTransport accepted the message (sync attempt 1 or any retry)
EMAIL_FAILEDSend failed — payload carries terminal (true = DLQ'd, false = will retry) and result (TRANSIENT / PERMANENT / MAX_ATTEMPTS_EXCEEDED)
EMAIL_RETRIEDA retry attempt is being made — fires before the transport call so the timeline reflects intent even if the transport throws unexpectedly

Every audit row carries template, transport, recipient, tenantId, and attempt. Recipient email is in the payload; the rendered HTML body is not — audit retention is for compliance, not for replaying customer mail.

Prometheus metrics

dvara_emails_sent_total{template, transport, result}
dvara_emails_retried_total{template, attempt}

result is SUCCESS, TRANSIENT, PERMANENT, or MAX_ATTEMPTS_EXCEEDED. The two counters cover both the headline delivery rate and the retry-pressure signal that tells you when to widen the backoff or cap.

Useful dashboard queries:

# Send-success rate (golden signal)
rate(dvara_emails_sent_total{result="SUCCESS"}[5m])
/ rate(dvara_emails_sent_total[5m])

# DLQ pressure — non-zero means customers are missing email
rate(dvara_emails_sent_total{result="MAX_ATTEMPTS_EXCEEDED"}[1h])

# Retry-storm signal — sustained increase = transport degraded
rate(dvara_emails_retried_total[5m])

Operational SQL (recovery + audit)

For day-to-day onboarding under transport=log, prefer the log-grep path above — internal table names and JSONB key paths are platform-implementation details you shouldn't need to learn just to grab a registration link.

These SQL paths are for recovery / audit work that has no log-side equivalent: inspecting the DLQ, replaying a dead-lettered row, answering "did this customer's invitation actually deliver three weeks ago?"

Look at the DLQ:

SELECT id, template, recipient, attempt_count, last_error, updated_at
FROM dvara_main.email_delivery_log
WHERE state = 'DEAD_LETTERED'
ORDER BY updated_at DESC
LIMIT 50;

Replay a DLQ row (manual — there's no /v1/admin/email/replay endpoint by design; replaying a DLQ row is an operator decision, not a self-service one):

UPDATE dvara_main.email_delivery_log
SET state = 'PENDING_RETRY',
next_attempt_at = NOW(),
last_error = NULL
WHERE id = 'the-dlq-row-id-here';

The retry sweeper will pick it up on the next tick.

Choosing a transport

ProfileRecommended
Local dev, CI, smoke testslog
Self-hosted with corporate SMTPsmtp
SaaS / managed deploymentresend (or smtp if you front your own SES / SendGrid)
Air-gapped / no-egresslog with manual operator escalation, or smtp to an internal relay

The default log transport is deliberately safe — a fresh install sends nothing to anyone until an operator explicitly picks smtp or resend. There is no built-in transport that calls out to a third party on first boot.

Migration note

Properties under dvara.flightdeck.email.resend.retry-* (retry-max-attempts, retry-initial-backoff-ms, retry-max-backoff-ms) on the legacy Resend transport are defunct as of 1.0.0-GA — retry now lives at the listener level via dvara.flightdeck.email.delivery.*. The legacy names are kept for one release with a deprecation WARN; remove them from your .env and use the delivery.* namespace.