Good observability does not mean collecting everything. It means collecting the right evidence before the incident starts.

Log business events

Use stable event names such as payment.capture.requested, invoice.reconciled, or queue.consumer.throttled. Include trace ID, actor type, resource ID, idempotency key, decision, and outcome so separate services can be joined during investigation.

Measure symptoms and causes

A queue lag alert is useful, but it is much better when shown next to consumer error rate, lock wait time, dependency latency, and producer throughput. The dashboard should help an engineer decide whether to drain, throttle, roll back, or escalate.

Protect sensitive data

The logging contract should explicitly ban secrets, payment tokens, raw authorization headers, and unclassified user payloads. Production evidence should make incidents clearer without creating a second security incident.