Caching AWS secrets in memory securely

Calling get_secret_value on every request burns through Secrets Manager rate limits and adds latency; caching it forever means you keep using a credential after it rotates. The fix is a thread-safe, TTL-bound, in-memory cache. This page extends AWS Secrets Manager Integration.

Problem 1: a fetch on every request

# ANTI-PATTERN: hammers the API and gets throttled
def handler(event):
    secret = boto3.client("secretsmanager").get_secret_value(SecretId="prod/db")
    return connect(secret["SecretString"])   # one API call per invocation

Under load this hits ThrottlingException and adds a network round-trip to every request.

Problem 2: a cache with no expiry

# ANTI-PATTERN: never refreshes, so rotation breaks the app
_SECRET = boto3.client("secretsmanager").get_secret_value(SecretId="prod/db")  # cached forever

Once the secret rotates, this value is stale and every connection fails until a redeploy.

Secure implementation

# secrets/cache.py
import json
import threading
import time
import boto3
from pydantic import SecretStr

_client = boto3.client("secretsmanager")
_lock = threading.Lock()
_cache: dict[str, tuple[float, dict]] = {}
TTL = 600                                    # seconds; keep below rotation interval

def get_secret(secret_id: str) -> dict[str, SecretStr]:
    now = time.monotonic()
    with _lock:                              # thread-safe: one fetch under contention
        cached = _cache.get(secret_id)
        if cached and now - cached[0] < TTL:
            return cached[1]
        raw = _client.get_secret_value(SecretId=secret_id)["SecretString"]
        parsed = {k: SecretStr(v) for k, v in json.loads(raw).items()}
        _cache[secret_id] = (now, parsed)
        return parsed

One lock means concurrent requests trigger a single fetch, not a thundering herd. The TTL guarantees the app re-fetches after rotation; SecretStr keeps values out of logs.

Gotchas & version-specific behaviour

Use time.monotonic(), not time.time(), so a clock adjustment cannot extend the TTL.
The cache lives in memory only — never pickle it to disk or a shared cache.
In multi-process servers (gunicorn), each worker has its own cache; size the TTL accordingly.
Set the TTL strictly below the Secrets Manager rotation interval so stale values expire fast.

Production parity checklist

TTL is shorter than the rotation interval.
Cache access is guarded by a lock; no duplicate concurrent fetches.
Values wrapped in SecretStr, unwrapped only at the driver call.
IAM scoped to GetSecretValue on the exact ARN.
No secret is ever written to disk or a shared store.

Conclusion

A locked, monotonic-based TTL cache eliminates throttling while staying rotation-aware. Pair it with the rotation patterns so the app picks up new credentials automatically.