Automated Secret Rotation Patterns
Rotation that only happens after a breach is not rotation. The goal is to make credential replacement routine and invisible: the secret changes on a schedule, and not a single request fails. The mechanism is a dual-credential overlap window — both the old and new credentials work simultaneously while traffic shifts over.
Rotation is the operational backbone of the enterprise secrets section. It applies whether the store is AWS Secrets Manager or Vault.
Secure implementation
# secrets/rotation.py
import time
from pydantic import SecretStr
class RotatingSecret:
"""Serves a secret from cache, refreshing within the overlap window."""
def __init__(self, fetch, ttl: int = 300):
self._fetch = fetch # callable returning the current SecretStr
self._ttl = ttl # must be shorter than the overlap window
self._value: SecretStr | None = None
self._loaded_at = 0.0
def get(self) -> SecretStr:
if self._value is None or time.monotonic() - self._loaded_at > self._ttl:
self._value = self._fetch() # picks up the rotated value
self._loaded_at = time.monotonic()
return self._value
def reconnect_pool(pool, secret: RotatingSecret) -> None:
pool.recreate(password=secret.get().get_secret_value()) # rebuild on rotation
The cache TTL is deliberately shorter than the store’s overlap window, so the application always picks up the new credential while the old one is still accepted.
Configuration reference
| Element | Value | Why |
|---|---|---|
| overlap window | > cache TTL | Both credentials valid during switch |
| cache TTL | 5–15 min | Picks up new value promptly |
| rotation interval | 30–90 days (static) | Routine, scheduled |
| dynamic lease | minutes–hours | Self-revoking, smallest blast radius |
SecretStr |
always | Masked in logs |
Deployment parity: local to production
- Local dev — test rotation by forcing a refresh and asserting the app reconnects.
- CI — a test rotates a fake secret mid-run and verifies no request fails.
- Staging — trigger a real rotation and watch for reconnection errors before production.
- Production — rotation runs on schedule; alerts fire if a credential nears expiry un-rotated.
Security boundaries & guardrails
- Always provision-new-before-revoke-old; never revoke first.
- Keep the cache TTL strictly below the overlap window.
- Reconnect connection pools when the credential changes; do not leave old connections open.
- Wrap rotating credentials in
SecretStrand unwrap only at the driver call. - Alert on any secret whose age exceeds the rotation interval.
Troubleshooting
- Requests fail during rotation — the overlap window is shorter than the cache TTL; widen the window or shorten the cache. See Zero-Downtime Secret Rotation in Python.
- Old credential still used after rotation — a connection pool was not recreated; reconnect on change.
- Rotation never triggers — the schedule or Lambda is misconfigured; assert rotation in staging.
- Credential expired before rotation — alerting is missing; add an age check.
Frequently asked questions
How do I rotate a secret without restarting the service?
Use a dual-credential overlap window. Provision the new credential while the old one still works, switch the application when its cache refreshes, then revoke the old credential once no process uses it.
How often should secrets be rotated?
On a fixed schedule sized to your risk tolerance — commonly 30–90 days for static secrets, and continuously for dynamic credentials on a short TTL. The point is that rotation is automated and routine, not a manual incident response.
What breaks most often during rotation?
Long-lived connections and over-long caches. A pool opened with the old credential keeps using it; size the cache TTL below the overlap window and reconnect pools when the credential changes.
Conclusion
The invariant: provision before revoke, overlap longer than the cache, reconnect pools on change. Done right, rotation is a non-event your users never notice.