YAML & JSON Parsing Strategies

Flat environment variables cannot express a routing table or a per-tenant feature map; structured files can. But yaml.load is a remote-code-execution vector — a single !!python/object/apply tag in a config file can run arbitrary code during parsing. This page parses structured configuration safely and then validates its shape.

Structured files are the source the configuration pipeline reaches for when data is nested. The parsed result should always flow into a pydantic-settings model so its structure is validated, never trusted raw.

Secure implementation

# config/structured.py
import json
import tomllib                      # stdlib in Python 3.11+
from pathlib import Path

import yaml
from pydantic import BaseModel


class CacheConfig(BaseModel):
    host: str
    port: int = 6379
    ssl: bool = True


class FileConfig(BaseModel):
    cache: CacheConfig
    feature_flags: dict[str, bool] = {}


def load_config(path: Path) -> FileConfig:
    text = path.read_text()
    if path.suffix in {".yaml", ".yml"}:
        raw = yaml.safe_load(text)      # NEVER yaml.load on untrusted input
    elif path.suffix == ".json":
        raw = json.loads(text)
    elif path.suffix == ".toml":
        raw = tomllib.loads(text)
    else:
        raise ValueError(f"Unsupported config format: {path.suffix}")
    return FileConfig.model_validate(raw)   # validate shape before use

safe_load removes the code-execution risk; model_validate turns an untyped dict into a checked object, rejecting missing keys and wrong types at load time.

Configuration reference

Format Loader Safe call Use when
YAML PyYAML yaml.safe_load Nested config, anchors
JSON stdlib json.loads Machine-generated config
TOML tomllib tomllib.loads Hand-edited app settings
any pydantic model_validate Validating parsed structure

Deployment parity: local to production

  1. Local dev — load the committed config.yaml; validate with the model so a malformed edit fails immediately.
  2. CI — parse and validate every config file as a test; a broken file fails the build, not the deploy.
  3. Staging/Production — mount the same file (or a per-environment overlay) read-only; the identical model validates it on boot.

Security boundaries & guardrails

  • yaml.safe_load only — yaml.load and yaml.full_load are forbidden on any file you do not fully control.
  • Set a size limit before parsing to defend against billion-laughs / entity-expansion attacks.
  • Validate the parsed dict with extra="forbid" so unexpected keys are rejected, not silently kept.
  • Keep secrets out of structured config files; reference them from a secret store instead.
  • Mount config files read-only in containers.

Troubleshooting

  • ConstructorError on load — the file uses a tag safe_load refuses; that tag is exactly what makes yaml.load dangerous. Remove it.
  • Nested value is a string, not a dict — indentation error in YAML; validate with the model to localize the failure. See Handling Nested Configuration in YAML Safely.
  • ValidationError: extra fields not permitted — a typo’d key; with extra="forbid" this is caught instead of ignored.
  • TOML KeyError in older Pythontomllib is 3.11+; on 3.10 install tomli.

Frequently asked questions

Why is yaml.load dangerous?

The default yaml.load can construct arbitrary Python objects from a YAML document, so a malicious or compromised config file can execute code during parsing. Always use yaml.safe_load, which only builds standard scalars, lists, and dicts.

Should I use YAML, JSON, or TOML for Python configuration?

Use JSON for machine-generated config, TOML for hand-edited application settings (the pyproject.toml standard), and YAML when you need anchors or deep nesting — but only ever parse YAML with safe_load.

How do I validate the structure of a parsed config file?

Parse to a plain dict with safe_load or json.load, then pass that dict into a pydantic model so types, required keys, and constraints are enforced before the values reach your application.

Conclusion

The invariant: structured config is parsed with the safe loader for its format and immediately validated by a model with extra="forbid". Never let an unvalidated dict from a file reach application code.