informational

What Is YAML? The Human-Friendly Data Language

2026-05-17 9 min read

YAML in Plain Terms

YAML stands for YAML Ain't Markup Language — a recursive acronym that hints at its philosophy: it is not about marking up documents the way HTML does. Instead, YAML is a data serialization format designed so that a person can read and edit it without a manual nearby. The specification was first published in 2001 by Clark Evans, Ingy döt Net, and Oren Ben-Kiki, and it has since become the default configuration language for tools like Kubernetes, Ansible, GitHub Actions, Docker Compose, and Ruby on Rails. At its core, YAML represents data as key-value pairs, lists, and nested structures using plain indentation rather than angle brackets or curly braces. A two-line Docker Compose entry looks like this: version: '3.9' services: web: image: nginx:latest ports: - '80:80' Compare that to the equivalent JSON and the difference is immediately obvious — no commas, no quotes on keys, no closing braces to lose track of. The tradeoff is that indentation becomes load-bearing: get it wrong by even two spaces and the file either fails to parse or silently means something different. That tension between readability and strictness is the defining characteristic of YAML, and understanding it is the first step to using the format confidently.

How YAML Structures Data: The Three Building Blocks

Every YAML document is built from three primitives: scalars, sequences, and mappings. A scalar is any single value — a string, number, boolean, or null. YAML infers the type automatically, which is convenient but occasionally surprising. The string 'yes' is parsed as a boolean true in YAML 1.1 (used by PyYAML and many older tools), while YAML 1.2 treats it as a plain string. This version gap has caused real production bugs, so it is worth knowing which parser your toolchain uses. A sequence is an ordered list, written with a leading dash and space: fruits: - apple - banana - mango A mapping is a set of key-value pairs, the most common structure in configuration files. Mappings can be nested to any depth, and that nesting is expressed purely through consistent indentation — the YAML specification requires spaces, never tabs. Beyond these basics, YAML supports anchors and aliases for reuse. Define a block once with an ampersand (&defaults) and reference it elsewhere with an asterisk (*defaults). This is heavily used in CI/CD pipelines where ten jobs share the same environment variables. YAML also supports multi-line strings through two block styles: the literal block scalar (|), which preserves newlines exactly, and the folded block scalar (>), which collapses newlines into spaces — useful for long shell commands that need to stay readable across multiple lines without actually containing line breaks when executed.

Where YAML Is Actually Used

YAML's adoption is concentrated in infrastructure and developer tooling, and the numbers reflect that. As of 2024, GitHub Actions has over 100 million workflow runs per day, every single one driven by a .yml file in the .github/workflows directory. Kubernetes, which orchestrates containers for the majority of cloud-native applications, uses YAML for every resource definition — Deployments, Services, ConfigMaps, Ingress rules, and more. A moderately complex microservices application can have several hundred YAML files. Ansible, the IT automation platform used by more than 25,000 organizations according to Red Hat's own figures, writes all its playbooks in YAML. A typical playbook task looks like: - name: Install nginx ansible.builtin.package: name: nginx state: present Beyond DevOps, YAML appears in static site generators (Jekyll stores front matter in YAML blocks at the top of Markdown files), in API testing tools like Hoppscotch and Insomnia for environment configuration, and in data science pipelines where tools like DVC (Data Version Control) track experiment parameters in YAML. One area where YAML is notably absent is data interchange between web services. REST APIs almost universally use JSON for request and response bodies because JSON parsers are built into every browser and the format is unambiguous. YAML is for humans editing files on disk; JSON is for machines passing data over a network. Keeping that distinction clear prevents a lot of confusion about when to reach for which format.

YAML vs. JSON vs. TOML: Choosing the Right Format

The three formats that compete most directly for configuration file use cases each have a distinct character. JSON (JavaScript Object Notation) is the strictest of the three. Every string must be quoted, every list and object must be explicitly closed, and there is no comment syntax whatsoever — a limitation that frustrates developers who want to annotate configuration. JSON's strength is unambiguous parsing: any two compliant parsers will produce identical output from the same input. File size is comparable to YAML for most real-world configs. TOML (Tom's Obvious, Minimal Language) was created specifically to address JSON's lack of comments and YAML's whitespace sensitivity. It uses an INI-like section syntax and is the format chosen by Rust's Cargo package manager and Python's pyproject.toml standard. TOML is excellent for flat or shallowly nested configs but becomes verbose for deeply nested structures. YAML wins when the document has significant nesting depth, when anchors and aliases can eliminate repetition, or when non-technical stakeholders need to edit the file. It loses when the indentation rules cause parsing errors that are hard to debug, when the type inference creates surprises (the Norway problem: the country code 'NO' parsed as boolean false in YAML 1.1), or when tooling support is weak. For converting between these formats, CocoConvert handles YAML-to-JSON and JSON-to-YAML conversions reliably for well-formed files. It does not, however, support TOML as a conversion target yet — if you need YAML-to-TOML, you will need a command-line tool like yq combined with a TOML serializer. Being upfront about that gap saves you time.

Common YAML Mistakes and How to Avoid Them

The most frequent source of YAML errors is indentation inconsistency. Unlike Python, which allows any consistent indentation width, YAML requires that sibling keys at the same level use exactly the same number of spaces. Mixing two-space and four-space indentation within a single file will either throw a parse error or, worse, silently restructure your data. The safest practice is to configure your editor to insert spaces on Tab and to set a consistent indent size — two spaces is the convention in Kubernetes and GitHub Actions, four spaces is common in Ansible. The second trap is unquoted special characters. Colons followed by a space are key-value separators in YAML, so a value like 'http://example.com:8080' must be quoted or it will cause a parse error. Similarly, values starting with curly braces, square brackets, or percent signs need quoting because YAML interprets those characters as flow-style mappings, sequences, or directives respectively. Type coercion surprises are the third category. Beyond the 'yes'/'no' boolean issue mentioned earlier, bare numbers with leading zeros are parsed as octal integers in some parsers — the value 0755 (a Unix file permission) becomes decimal 493 unless quoted. Dates are another landmine: 2024-01-01 without quotes becomes a date object, not a string, which breaks tools expecting a string. A practical defense against all of these is running your YAML through a linter before committing it. yamllint is a command-line tool (pip install yamllint) that catches indentation errors, trailing spaces, and line length issues. Most CI pipelines should include a yamllint step. For one-off validation, pasting your file into CocoConvert's YAML-to-JSON converter and checking whether the output matches your intent is a fast sanity check — if the JSON looks wrong, the YAML has a structural problem.

Converting YAML Files: What CocoConvert Can Do

CocoConvert's YAML converter handles the two most common conversion paths: YAML to JSON and JSON to YAML. The process is straightforward — paste your content or upload a .yaml or .yml file, select the target format, and download the result. The converter preserves nesting depth accurately and handles multi-document YAML files (documents separated by ---) by converting each document as a separate JSON object in an array. For YAML-to-JSON conversions, the output is formatted with two-space indentation by default, which is readable and compatible with virtually every JSON tool. If you need minified JSON — for embedding in an environment variable or reducing payload size — there is a compact output option on the results page. For JSON-to-YAML conversions, the output uses two-space indentation and omits the document start marker (---) unless the input contains multiple JSON objects, in which case each becomes a YAML document separated by ---. String values that would be misinterpreted as booleans, nulls, or numbers are automatically quoted in the output, so you do not have to manually add quotes around values like 'true' or '1.0' that you intend as strings. Honest limitations worth noting: CocoConvert does not preserve YAML comments during conversion — comments are not part of the data model and get stripped during parsing, which is a fundamental constraint of the format, not a tool deficiency. Anchors and aliases are resolved before output, so the resulting JSON or re-serialized YAML will have the values repeated rather than referenced. Very large files above 10 MB may time out on the free tier; for those, the yq command-line tool (written in Go, available at mikefarah.gitbook.io/yq) is a better choice and handles files of any size.

When to Use YAML and When to Step Back

YAML earns its place when three conditions are met: the file will be edited by humans regularly, the data has meaningful nesting, and the tooling ecosystem already expects YAML. Kubernetes manifests, CI/CD pipelines, and Ansible playbooks all satisfy all three conditions. Forcing JSON into those contexts would mean losing readability and gaining nothing. YAML is the wrong choice when the file is generated and consumed entirely by machines with no human editing step — use JSON or a binary format like MessagePack or Protocol Buffers instead. It is also the wrong choice when the team has recurring parse errors from indentation mistakes and no appetite for linting discipline. In that case, TOML or even a well-structured JSON file with comments stripped by a preprocessor will cause fewer incidents. For data science and configuration-heavy Python projects, YAML has a security consideration worth knowing: PyYAML's yaml.load() function can execute arbitrary Python code embedded in a YAML file if the input is untrusted. Always use yaml.safe_load() when parsing YAML from external sources. This is not a theoretical risk — it has been exploited in real supply chain attacks against open source projects. The format has been around for 23 years and shows no signs of being displaced from its core use cases. The YAML 1.2 specification, which fixed most of the type coercion surprises from 1.1, is now supported by major parsers including PyYAML 6.0+, js-yaml 4.x, and Go's gopkg.in/yaml.v3. Migrating to a 1.2-compliant parser is the single most impactful upgrade you can make to an existing YAML-heavy project, eliminating an entire class of subtle bugs without changing a single line of your configuration files.

← Browse all articles