How to Convert YAML to JSON: Common Gotchas to Avoid
Why YAML and JSON Aren't as Interchangeable as They Look
YAML and JSON look similar, and their relationship is close. YAML 1.2 is even a superset of JSON, so any valid JSON is also valid YAML. That sounds great, right? It is, until you convert a real-world YAML file and discover your first silent data corruption. The two formats simply have different design goals. JSON was built for machines: strict, unambiguous, and with no room for comments. YAML, on the other hand, was built for humans. It uses indentation for structure, supports multi-line strings, allows inline comments, and features a type inference system that tries to guess your intent. This helpful guessing is precisely where conversions go sideways. A YAML parser might read the string 'yes' and interpret it as the boolean `true`. It might see '1.0' and produce a float, not the string you typed. These aren't bugs; the YAML spec is working as designed. The issue arises because JSON has no such ambiguity. Once your YAML value becomes a boolean in the parsed data, the JSON output will write `true`, and the original string is lost forever. When you're converting configuration files for a Kubernetes cluster, an OpenAPI spec, or a CI/CD pipeline, these silent type changes can break everything downstream without a single error message. To convert files reliably, you have to understand these fundamental differences.
The Fastest Way to Convert: Using CocoConvert
When you just need to convert a file, the fastest way is using a dedicated tool instead of cobbling together a script. CocoConvert's [YAML to JSON converter](/convert/yaml-to-json) manages all the parsing and serialization, giving you properly formatted, UTF-8 encoded output right away. The process is dead simple: paste your YAML, upload a .yaml or .yml file, and click Convert. Your JSON appears in the output panel, ready to be copied or downloaded. CocoConvert uses modern YAML 1.2 parsing rules, so you won't get bitten by the old "Norway Problem" where the string 'NO' was misinterpreted as the boolean `false`. If your source YAML has an indentation mistake, you'll get a clear parse error with a line number instead of silently mangled output. It also correctly handles multi-document YAML files (those with `---` separators). These are converted into a JSON array, where each document becomes an array element. This is the standard, expected behavior, but it's good to remember if you see an unexpected array in your output because your file started with `---`. There is one limitation: the tool doesn't support YAML anchors and aliases that reference nodes across different documents in the same file. For those complex cases involving cross-document anchors, you will need to resolve them first, either by hand or with a local script, before uploading the file for conversion.
YAML Type Coercion: The Gotchas That Bite Hardest
Type coercion is the number one cause of data loss when converting from YAML to JSON. Before you convert any production file, you absolutely must audit for these specific gotchas. **Booleans from unexpected strings.** Old YAML 1.1 parsers (like PyYAML before version 6.0) would interpret `yes`, `no`, `on`, and `off` as booleans. Modern YAML 1.2 only treats `true` and `false` this way, but if your source file was created by an older tool, it might contain 'yes' when it really means the string 'yes'. If you don't know the file's origin, you need to check for these values manually. **Octal integers.** This one is a classic. In YAML, a value like `0755` is parsed as the octal integer 493. This is a notorious trap in Kubernetes manifests for setting file permissions. When converted, your JSON will contain the number `493`, not the string `'0755'`. If a downstream process tries to use that number in a `chmod` call, the permissions will be completely wrong, and you'll get no error. **Floating-point edge cases.** YAML understands special float values like `.inf`, `-.inf`, and `.nan`. JSON does not. CocoConvert handles this by converting them to the strings 'Infinity', '-Infinity', and 'NaN'. This is a sensible fallback, but if your application expects only numbers, it might fail on these string values, requiring post-processing. **Null representations.** YAML is flexible with nulls, accepting `null`, `~`, or even just an empty value after a key. All of these will become a standard `null` in JSON. This is usually fine, but remember that a key with nothing after the colon becomes a JSON `null`, not an empty string `""`.
Handling Multi-Line Strings and Comments
YAML offers two powerful multi-line string syntaxes that have no direct equivalent in JSON: literal block scalars (`|`) and folded block scalars (`>`). A literal block (`|`) preserves every single newline. A folded block (`>`) turns single newlines into spaces but keeps double newlines as actual newlines. Both syntaxes produce a single JSON string, but the subtle differences in newline handling are critical for embedded content like shell scripts, SQL queries, or certificates. For example, this YAML: ```yaml script: | echo hello echo world ``` becomes this JSON: ```json {"script": "echo hello\necho world\n"} ``` Notice the trailing newline (`\n`) is preserved by default with the literal `|` style. To strip it, you'd use the chomping indicator `|-`. Anyone who has debugged a CI script failing on a subtle whitespace difference knows this pain. Getting this wrong can break scripts or APIs that are sensitive to whitespace. Comments are a much harder problem. YAML supports comments using `#`. JSON does not. Period. This means that during conversion, every single comment in your YAML file is permanently deleted. All the crucial context explaining *why* a certain value is set—a common practice in infrastructure-as-code—vanishes from the JSON output. There's no workaround within the JSON spec. My recommendation is simple: always treat your commented YAML as the source of truth and the generated JSON as a disposable build artifact. Some teams use JSONC (JSON with Comments), but that just kicks the compatibility can down the road.
Anchors, Aliases, and Merge Keys
YAML's anchors and aliases are a fantastic feature for keeping your files DRY (Don't Repeat Yourself), but they introduce complexity during JSON conversion. You define an anchor with `&anchor-name` and then reference it with `*anchor-name`. A YAML parser expands these aliases as it reads the file, building the final data structure in memory. The JSON output, therefore, contains the fully expanded, duplicated content, with no trace of the original anchors. Consider this common pattern: ```yaml defaults: &defaults timeout: 30 retries: 3 production: <<: *defaults host: prod.example.com staging: <<: *defaults host: staging.example.com ``` The `<<` syntax is a YAML merge key. The resulting JSON will be: ```json { "defaults": {"timeout": 30, "retries": 3}, "production": {"timeout": 30, "retries": 3, "host": "prod.example.com"}, "staging": {"timeout": 30, "retries": 3, "host": "staging.example.com"} } ``` The expansion is correct, but the conciseness of the original YAML is gone. If 50 services inherited from that defaults anchor, the JSON will contain 50 copies of that data. For a machine, this is perfectly fine. For a human trying to read the file, or for systems where file size is a concern, it's a significant drawback. Be aware that merge key support (`<<`) is technically a YAML extension, not part of the core spec, so some strict parsers will reject it. CocoConvert handles merge keys without issue. If you're scripting a conversion with Python's PyYAML, you must use `yaml.full_load()` or `yaml.safe_load()`. Avoid the old `yaml.load()` without a `Loader` argument, as it's been deprecated since PyYAML 5.1 due to major security risks.
Converting YAML to JSON Programmatically
For bulk conversions, build pipeline integrations, or any kind of automated processing, you'll need a command-line or scripted solution. A web tool is great for one-offs, but automation demands code. These are the most reliable ways to do it. **Python (most portable option):** ```python import yaml, json, sys with open(sys.argv[1], 'r') as f: data = yaml.safe_load(f) print(json.dumps(data, indent=2, ensure_ascii=False)) ``` Always use `yaml.safe_load()`. The old `yaml.load()` is a security nightmare that can execute arbitrary code from a malicious YAML file. The `ensure_ascii=False` argument is also a good habit, as it preserves Unicode characters instead of escaping them. **Node.js:** ```javascript const yaml = require('js-yaml'); const fs = require('fs'); const data = yaml.load(fs.readFileSync(process.argv[2], 'utf8')); console.log(JSON.stringify(data, null, 2)); ``` The `js-yaml` library uses modern YAML 1.2 rules by default (since v4.0). If you're working in an older project, double-check your `package.json`. Versions before 4.0 use YAML 1.1 rules and will incorrectly coerce strings like 'yes' and 'no' to booleans. **yq (command-line tool):** ```bash yq -o=json eval '.' input.yaml > output.json ``` Frankly, `yq` is the best tool for this job on the command line. It's a purpose-built YAML processor that handles everything correctly—multi-document files, anchors, merge keys—with a simple flag for JSON output. Install it with Homebrew on macOS (`brew install yq`) or grab the binary from GitHub for Linux/Windows. Of course, for a quick conversion without installing anything, the [CocoConvert YAML to JSON tool](/convert/yaml-to-json) is still the fastest way to get it done.
Validating Your Output Before Using It
Converting a file without validating it is a recipe for introducing subtle bugs into production. A JSON file can be perfectly valid syntactically but contain semantically wrong data, like the type coercions we've discussed. Here’s a practical checklist to save you from future headaches. **Syntax validation.** At a minimum, run the output through a JSON linter. Your code editor (like VS Code or a JetBrains IDE) probably does this automatically. From the command line, Python's built-in `json.tool` is a reliable workhorse: `python3 -m json.tool output.json > /dev/null`. It exits with code 0 for valid JSON and tells you exactly where it broke on failure. **Schema validation.** For critical files, use a schema. If your target format has a JSON Schema (common for OpenAPI specs, AWS CloudFormation, and Kubernetes CRDs), validate against it. A tool like `ajv-cli` (`ajv validate -s schema.json -d output.json`) will catch type mismatches that a simple syntax check can't see. **Diff against a known good version.** When you have a reference JSON file, diffing is essential. But first, normalize the key order to avoid noisy, meaningless differences. The `jq` tool can sort keys deterministically: `jq --sort-keys . output.json > normalized.json`. Remember, key order in JSON doesn't matter, but it will drive you crazy when you're trying to compare files. **Spot-check for coerced types.** If you suspect your YAML had values like '1.0' or '0755', check the JSON output directly. A quick `grep -n "0755" output.json` will tell you instantly if your octal string survived the conversion or was turned into a useless integer. Seriously, taking five minutes to validate your output before a commit or deployment is always faster than debugging a production incident caused by a boolean that was supposed to be a string.