Automated Secret Rotation Patterns

Rotation that only happens after a breach is not rotation. The goal is to make credential replacement routine and invisible: the secret changes on a schedule, and not a single request fails. The mechanism is a dual-credential overlap window — both the old and new credentials work simultaneously while traffic shifts over.

Rotation is the operational backbone of the enterprise secrets section. It applies whether the store is AWS Secrets Manager or Vault.

Zero-downtime rotation overlap window Phase 1 the old credential serves traffic; phase 2 both old and new are valid during an overlap; phase 3 only the new credential remains after the old is revoked. Phase 1 old credential only Phase 2 — overlap old + new both valid Phase 3 new credential only
During the overlap window both credentials work, so traffic shifts with no failed requests.

Secure implementation

# secrets/rotation.py
import time
from pydantic import SecretStr

class RotatingSecret:
    """Serves a secret from cache, refreshing within the overlap window."""

    def __init__(self, fetch, ttl: int = 300):
        self._fetch = fetch          # callable returning the current SecretStr
        self._ttl = ttl              # must be shorter than the overlap window
        self._value: SecretStr | None = None
        self._loaded_at = 0.0

    def get(self) -> SecretStr:
        if self._value is None or time.monotonic() - self._loaded_at > self._ttl:
            self._value = self._fetch()        # picks up the rotated value
            self._loaded_at = time.monotonic()
        return self._value

def reconnect_pool(pool, secret: RotatingSecret) -> None:
    pool.recreate(password=secret.get().get_secret_value())  # rebuild on rotation

The cache TTL is deliberately shorter than the store’s overlap window, so the application always picks up the new credential while the old one is still accepted.

Configuration reference

Element Value Why
overlap window > cache TTL Both credentials valid during switch
cache TTL 5–15 min Picks up new value promptly
rotation interval 30–90 days (static) Routine, scheduled
dynamic lease minutes–hours Self-revoking, smallest blast radius
SecretStr always Masked in logs

Deployment parity: local to production

  1. Local dev — test rotation by forcing a refresh and asserting the app reconnects.
  2. CI — a test rotates a fake secret mid-run and verifies no request fails.
  3. Staging — trigger a real rotation and watch for reconnection errors before production.
  4. Production — rotation runs on schedule; alerts fire if a credential nears expiry un-rotated.

Security boundaries & guardrails

  • Always provision-new-before-revoke-old; never revoke first.
  • Keep the cache TTL strictly below the overlap window.
  • Reconnect connection pools when the credential changes; do not leave old connections open.
  • Wrap rotating credentials in SecretStr and unwrap only at the driver call.
  • Alert on any secret whose age exceeds the rotation interval.

Troubleshooting

  • Requests fail during rotation — the overlap window is shorter than the cache TTL; widen the window or shorten the cache. See Zero-Downtime Secret Rotation in Python.
  • Old credential still used after rotation — a connection pool was not recreated; reconnect on change.
  • Rotation never triggers — the schedule or Lambda is misconfigured; assert rotation in staging.
  • Credential expired before rotation — alerting is missing; add an age check.

Frequently asked questions

How do I rotate a secret without restarting the service?

Use a dual-credential overlap window. Provision the new credential while the old one still works, switch the application when its cache refreshes, then revoke the old credential once no process uses it.

How often should secrets be rotated?

On a fixed schedule sized to your risk tolerance — commonly 30–90 days for static secrets, and continuously for dynamic credentials on a short TTL. The point is that rotation is automated and routine, not a manual incident response.

What breaks most often during rotation?

Long-lived connections and over-long caches. A pool opened with the old credential keeps using it; size the cache TTL below the overlap window and reconnect pools when the credential changes.

Conclusion

The invariant: provision before revoke, overlap longer than the cache, reconnect pools on change. Done right, rotation is a non-event your users never notice.