Zero-downtime secret rotation in Python
The hard part of rotation is not generating a new credential — it is swapping it into a running service without a single failed request. The answer is a refreshing cache plus an overlap window where both credentials work. This page implements it, extending Automated Secret Rotation Patterns.
Problem 1: restart-to-rotate
# ANTI-PATTERN: the only way to pick up a new secret is a redeploy
SECRET = fetch_secret() # module-level; rotation requires restarting every pod
Tying rotation to a restart means downtime and a manual step.
Problem 2: a race on refresh
# ANTI-PATTERN: two threads refresh at once, one overwrites the other
if expired:
global SECRET
SECRET = fetch_secret() # no lock: torn reads under concurrency
Without a lock, concurrent requests can see a half-updated value.
Secure implementation
# secrets/refresh.py
import threading, time
from pydantic import SecretStr
class RefreshingSecret:
def __init__(self, fetch, ttl: int = 300):
self._fetch, self._ttl = fetch, ttl # ttl < overlap window
self._value: SecretStr | None = None
self._at = 0.0
self._lock = threading.Lock()
def get(self) -> SecretStr:
now = time.monotonic()
if self._value is None or now - self._at > self._ttl:
with self._lock: # one refresh under contention
if self._value is None or time.monotonic() - self._at > self._ttl:
self._value, self._at = self._fetch(), time.monotonic()
return self._value
def on_rotation(pool, secret: RefreshingSecret) -> None:
pool.recreate(password=secret.get().get_secret_value()) # graceful pool reload
The double-checked lock guarantees exactly one refresh under load; the TTL stays below the store’s overlap window, so the old credential is still valid while the cache catches up; pools are recreated rather than torn down mid-request.
Gotchas & version-specific behaviour
- The cache TTL must be shorter than the credential overlap window or requests fail at the cutover.
- Use double-checked locking so a refresh under load does not stampede the secret store.
- Recreate connection pools on change; do not close active connections abruptly.
time.monotonic()for all timing so clock changes cannot extend the TTL.
Production parity checklist
- Rotation needs no restart — the refreshing cache picks up new values.
- Cache TTL is below the overlap window.
- Refresh is thread-safe (double-checked lock).
- Pools reload gracefully on credential change.
- A staging drill forces rotation and asserts zero failed requests.
Conclusion
A thread-safe refreshing cache sized under the overlap window turns rotation into a non-event — no restart, no failed request. For the overlap-window mechanics in the store, see Automated Secret Rotation Patterns.