Caching AWS secrets in memory securely

Calling get_secret_value on every request burns through Secrets Manager rate limits and adds latency; caching it forever means you keep using a credential after it rotates. The fix is a thread-safe, TTL-bound, in-memory cache. This page extends AWS Secrets Manager Integration.

Problem 1: a fetch on every request

# ANTI-PATTERN: hammers the API and gets throttled
def handler(event):
    secret = boto3.client("secretsmanager").get_secret_value(SecretId="prod/db")
    return connect(secret["SecretString"])   # one API call per invocation

Under load this hits ThrottlingException and adds a network round-trip to every request.

Problem 2: a cache with no expiry

# ANTI-PATTERN: never refreshes, so rotation breaks the app
_SECRET = boto3.client("secretsmanager").get_secret_value(SecretId="prod/db")  # cached forever

Once the secret rotates, this value is stale and every connection fails until a redeploy.

Secure implementation

# secrets/cache.py
import json
import threading
import time
import boto3
from pydantic import SecretStr

_client = boto3.client("secretsmanager")
_lock = threading.Lock()
_cache: dict[str, tuple[float, dict]] = {}
TTL = 600                                    # seconds; keep below rotation interval

def get_secret(secret_id: str) -> dict[str, SecretStr]:
    now = time.monotonic()
    with _lock:                              # thread-safe: one fetch under contention
        cached = _cache.get(secret_id)
        if cached and now - cached[0] < TTL:
            return cached[1]
        raw = _client.get_secret_value(SecretId=secret_id)["SecretString"]
        parsed = {k: SecretStr(v) for k, v in json.loads(raw).items()}
        _cache[secret_id] = (now, parsed)
        return parsed

One lock means concurrent requests trigger a single fetch, not a thundering herd. The TTL guarantees the app re-fetches after rotation; SecretStr keeps values out of logs.

Gotchas & version-specific behaviour

  • Use time.monotonic(), not time.time(), so a clock adjustment cannot extend the TTL.
  • The cache lives in memory only — never pickle it to disk or a shared cache.
  • In multi-process servers (gunicorn), each worker has its own cache; size the TTL accordingly.
  • Set the TTL strictly below the Secrets Manager rotation interval so stale values expire fast.

Production parity checklist

  • TTL is shorter than the rotation interval.
  • Cache access is guarded by a lock; no duplicate concurrent fetches.
  • Values wrapped in SecretStr, unwrapped only at the driver call.
  • IAM scoped to GetSecretValue on the exact ARN.
  • No secret is ever written to disk or a shared store.

Conclusion

A locked, monotonic-based TTL cache eliminates throttling while staying rotation-aware. Pair it with the rotation patterns so the app picks up new credentials automatically.