Skip to content
Back to Blog
informational

What Is YAML? The Human-Friendly Data Language

2026-05-17 9 min read

YAML in Plain Terms

YAML's full name is 'YAML Ain't Markup Language' — a recursive jab that tells you what it isn't. It's not for marking up documents like HTML. Instead, YAML is a data serialization format designed to be readable and editable by a person without a manual open in the next tab. It first appeared in 2001, a creation of Clark Evans, Ingy döt Net, and Oren Ben-Kiki. Since then, it has become the default configuration language for essential tools like Kubernetes, Ansible, GitHub Actions, Docker Compose, and Ruby on Rails. At its heart, YAML uses indentation to represent data structures like key-value pairs and lists. You won't find angle brackets or curly braces here. A simple Docker Compose file shows this clearly: version: '3.9' services: web: image: nginx:latest ports: - '80:80' Set that against the equivalent JSON and the appeal is obvious. No commas, no quotes on keys, and no closing braces to mismatch. The trade-off is that indentation becomes significant. Get it wrong by just one space, and the file might fail to parse or, worse, silently mean something else entirely. This tension between easy reading and strict structure defines the YAML experience. Getting comfortable with it is the key to using the format well.

How YAML Structures Data: The Three Building Blocks

Every YAML document is composed of the same basic parts: scalars, sequences, and mappings. A scalar is just a single value, like a string, a number, a boolean, or null. YAML is clever about inferring the type, which is mostly helpful but can sometimes backfire. The string 'yes' is parsed as a boolean `true` in the older YAML 1.1 standard, which is still used by PyYAML and other legacy tools. YAML 1.2, however, correctly treats it as a plain string. That version gap has caused real-world production bugs, so you absolutely need to know which parser your tools are using. A sequence is just an ordered list. You write it with a leading dash and a space: fruits: - apple - banana - mango A mapping is a set of key-value pairs, which is what you'll see most often in configuration files. You can nest mappings to any depth, and the structure is defined purely by consistent indentation. The YAML spec is clear on this: you must use spaces, never tabs. YAML also offers powerful features for reducing repetition. Anchors (&) and aliases (*) let you define a block of data once and reuse it elsewhere. This is a lifesaver in CI/CD pipelines where multiple jobs might share the exact same block of environment variables. For long strings, YAML has two block styles: the literal block scalar (|) preserves newlines exactly, while the folded block scalar (>) collapses them into spaces. This is perfect for long shell commands that you want to keep readable in the file without adding actual line breaks to the command itself.

Where YAML Is Actually Used

YAML thrives in the world of infrastructure and developer tooling. The numbers don't lie. As of 2024, GitHub Actions processes over 100 million workflow runs daily, every single one driven by a .yml file in a project's .github/workflows directory. Kubernetes, the engine behind most cloud-native applications, relies on YAML for defining every single resource: Deployments, Services, ConfigMaps, you name it. A typical microservices application can easily accumulate hundreds of YAML files. Ansible, the IT automation tool used by over 25,000 organizations (according to Red Hat), uses YAML for all of its playbooks. A standard task in an Ansible playbook looks like this: - name: Install nginx ansible.builtin.package: name: nginx state: present Beyond the DevOps sphere, you'll find YAML in static site generators like Jekyll, which uses YAML front matter to store metadata in Markdown files. API testing tools like Hoppscotch and Insomnia use it for environment configuration. Even data science pipelines use it; tools such as DVC (Data Version Control) track experiment parameters in YAML files. One place you won't see much YAML is in data interchange between web services. REST APIs almost universally use JSON for request and response bodies. Why? Because JSON parsers are built into every browser and the format's strictness leaves no room for ambiguity. This is the key distinction: YAML is for humans editing files on disk; JSON is for machines passing data over a network. Remembering that simple rule will prevent a lot of confusion about which format to choose.

YAML vs. JSON vs. TOML: Choosing the Right Format

When picking a configuration format, you're usually choosing between YAML, JSON, and TOML. They each have a distinct personality. JSON (JavaScript Object Notation) is the strictest. Every string must be quoted, every list and object explicitly closed, and it has no support for comments. That last one is a major source of frustration for developers wanting to annotate configuration. But JSON's strength is its rigid, unambiguous parsing; any two compliant parsers will produce identical data structures from the same input. Its file size is generally comparable to YAML for typical configs. TOML (Tom's Obvious, Minimal Language) was created to fix JSON's lack of comments and YAML's tricky whitespace rules. It uses an INI-style syntax and has been adopted by Rust's Cargo package manager and Python's pyproject.toml standard. TOML is fantastic for flat or shallowly nested configurations, but it gets clumsy and verbose when you need to represent deeply nested data. So where does YAML fit? YAML is the clear winner when your configuration has significant nesting depth, when you can use anchors and aliases to eliminate repetition, or when non-technical people need to edit the file. It's the wrong choice when your team is constantly fighting indentation errors or when type inference creates surprises (like the infamous Norway problem, where the country code 'NO' was parsed as boolean `false`). If you need to convert between formats, CocoConvert offers reliable YAML-to-JSON and JSON-to-YAML conversions. It does not support TOML as an output format, however. If you need to get from YAML to TOML, you'll have to reach for a command-line tool like `yq`. It's better to know that upfront.

Common YAML Mistakes and How to Avoid Them

The single most common source of YAML errors is inconsistent indentation. Anyone who has spent an hour debugging a CI pipeline only to find a single misplaced space knows this pain. Unlike Python, which accepts any consistent indent width, YAML requires sibling keys at the same level to have the exact same indentation. Mixing two-space and four-space indentation in one file will cause a parse error or, much worse, silently restructure your data. The only safe way to work is to configure your editor to use spaces for tabs and enforce a consistent indent size. Two spaces is the de-facto standard for Kubernetes and GitHub Actions. Unquoted special characters are another trap. A colon followed by a space is a key-value separator, so a string like 'http://example.com:8080' must be quoted. Forget the quotes, and you'll get a parse error. Likewise, values starting with `{`, `[`, or `%` need quoting because they have special meaning in YAML syntax. Then there are the type coercion surprises. We already mentioned how 'no' can become `false`. But did you know that numbers with leading zeros can be parsed as octal integers? The value 0755, a common Unix file permission, becomes decimal 493 unless you quote it. Dates are another landmine; 2024-01-01 without quotes becomes a date object, not a string, which can break tools that expect a string. The best defense is to lint your YAML before you commit it. `yamllint` is an essential command-line tool that catches indentation errors, trailing spaces, and other common issues. Your CI pipeline should absolutely include a `yamllint` step. For a quick one-off check, pasting your file into CocoConvert's YAML-to-JSON converter is a great sanity test. If the resulting JSON structure doesn't look like what you intended, your YAML has a problem.

Converting YAML Files: What CocoConvert Can Do

CocoConvert provides tools for the two most common conversion needs: YAML to JSON and JSON to YAML. The process is simple: paste your content or upload a .yaml file, pick your target format, and download the result. The converter accurately preserves your data's nested structure. It also correctly handles multi-document YAML files (where documents are separated by ---), converting each one into a separate JSON object within a larger array. When converting YAML to JSON, the output is formatted with a standard two-space indentation, making it readable and compatible with nearly all JSON tools. If you need a compact, single-line JSON string—perhaps for an environment variable or to reduce payload size—a minify option is available on the results page. Going from JSON to YAML, the converter uses two-space indentation and omits the document start marker (---) for single documents. If the input contains multiple JSON objects, each one becomes a distinct YAML document separated by the `---` marker. Crucially, it automatically quotes string values that could be misinterpreted as booleans, nulls, or numbers, so you don't have to worry about a string like 'true' or '1.0' causing problems. Let's be direct about the limitations. CocoConvert does not preserve YAML comments during conversion. This isn't a tool deficiency; comments are not part of YAML's formal data model, so they are stripped out by the parser. Anchors and aliases are also resolved, meaning the final output will contain repeated values rather than references. Finally, very large files (over 10 MB) might time out on the free tier. For those big jobs, a command-line tool like `yq` is the better choice.

When to Use YAML and When to Step Back

YAML is the right tool for the job when a few conditions are met. The file will be read and edited by humans, the data has meaningful nested structure, and the surrounding ecosystem already uses YAML. Kubernetes manifests, CI/CD pipelines, and Ansible playbooks are the poster children for this. Trying to use JSON in these scenarios just makes life harder for no real benefit. Conversely, YAML is the wrong choice when a file is only ever touched by machines. For machine-to-machine communication, use JSON or a more efficient binary format like MessagePack or Protocol Buffers. It's also a poor choice if your team consistently struggles with indentation errors and lacks the discipline to use a linter. In that situation, the simpler syntax of TOML, or even a pre-processed JSON file, will lead to fewer production incidents. If you're using YAML in Python projects, there is a critical security risk you must understand. PyYAML's default `yaml.load()` function can execute arbitrary code embedded in a YAML file. If you are parsing YAML from an untrusted source, you must always use `yaml.safe_load()`. This is not a theoretical vulnerability; it has been exploited in real supply chain attacks. The format has been with us for over two decades and isn't going anywhere. The YAML 1.2 specification, which fixed most of the annoying type coercion issues from version 1.1, is now widely supported by modern parsers like PyYAML 6.0+, js-yaml 4.x, and Go's yaml.v3. My strongest recommendation: if you're working on a project that relies heavily on YAML, migrating to a 1.2-compliant parser is the single most impactful upgrade you can make. It will eliminate an entire class of subtle bugs without you having to change a single line of configuration.