YAML & JSON Parsing Strategies
Flat environment variables cannot express a routing table or a per-tenant feature map; structured files can. But yaml.load is a remote-code-execution vector — a single !!python/object/apply tag in a config file can run arbitrary code during parsing. This page parses structured configuration safely and then validates its shape.
Structured files are the source the configuration pipeline reaches for when data is nested. The parsed result should always flow into a pydantic-settings model so its structure is validated, never trusted raw.
Secure implementation
# config/structured.py
import json
import tomllib # stdlib in Python 3.11+
from pathlib import Path
import yaml
from pydantic import BaseModel
class CacheConfig(BaseModel):
host: str
port: int = 6379
ssl: bool = True
class FileConfig(BaseModel):
cache: CacheConfig
feature_flags: dict[str, bool] = {}
def load_config(path: Path) -> FileConfig:
text = path.read_text()
if path.suffix in {".yaml", ".yml"}:
raw = yaml.safe_load(text) # NEVER yaml.load on untrusted input
elif path.suffix == ".json":
raw = json.loads(text)
elif path.suffix == ".toml":
raw = tomllib.loads(text)
else:
raise ValueError(f"Unsupported config format: {path.suffix}")
return FileConfig.model_validate(raw) # validate shape before use
safe_load removes the code-execution risk; model_validate turns an untyped dict into a checked object, rejecting missing keys and wrong types at load time.
Configuration reference
| Format | Loader | Safe call | Use when |
|---|---|---|---|
| YAML | PyYAML | yaml.safe_load |
Nested config, anchors |
| JSON | stdlib | json.loads |
Machine-generated config |
| TOML | tomllib |
tomllib.loads |
Hand-edited app settings |
| any | pydantic | model_validate |
Validating parsed structure |
Deployment parity: local to production
- Local dev — load the committed
config.yaml; validate with the model so a malformed edit fails immediately. - CI — parse and validate every config file as a test; a broken file fails the build, not the deploy.
- Staging/Production — mount the same file (or a per-environment overlay) read-only; the identical model validates it on boot.
Security boundaries & guardrails
yaml.safe_loadonly —yaml.loadandyaml.full_loadare forbidden on any file you do not fully control.- Set a size limit before parsing to defend against billion-laughs / entity-expansion attacks.
- Validate the parsed dict with
extra="forbid"so unexpected keys are rejected, not silently kept. - Keep secrets out of structured config files; reference them from a secret store instead.
- Mount config files read-only in containers.
Troubleshooting
ConstructorErroron load — the file uses a tagsafe_loadrefuses; that tag is exactly what makesyaml.loaddangerous. Remove it.- Nested value is a string, not a dict — indentation error in YAML; validate with the model to localize the failure. See Handling Nested Configuration in YAML Safely.
ValidationError: extra fields not permitted— a typo’d key; withextra="forbid"this is caught instead of ignored.- TOML
KeyErrorin older Python —tomllibis 3.11+; on 3.10 installtomli.
Frequently asked questions
Why is yaml.load dangerous?
The default yaml.load can construct arbitrary Python objects from a YAML document, so a malicious or compromised config file can execute code during parsing. Always use yaml.safe_load, which only builds standard scalars, lists, and dicts.
Should I use YAML, JSON, or TOML for Python configuration?
Use JSON for machine-generated config, TOML for hand-edited application settings (the pyproject.toml standard), and YAML when you need anchors or deep nesting — but only ever parse YAML with safe_load.
How do I validate the structure of a parsed config file?
Parse to a plain dict with safe_load or json.load, then pass that dict into a pydantic model so types, required keys, and constraints are enforced before the values reach your application.
Conclusion
The invariant: structured config is parsed with the safe loader for its format and immediately validated by a model with extra="forbid". Never let an unvalidated dict from a file reach application code.