What is YAML?

We look at the pros and cons of the language developed in 2001

Programming code abstract on a reflective background

The increasing popularity of Kubernetes means you've probably heard of YAML because it's the format for Kubernetes configuration files. 

While it's not as ubiquitous as JSON, YAML goes far beyond Kubernetes. The human-readable data-serialisation language was first released in 2001,  and it's used in tools from OpenStack to Ansible playbooks.

Originally YAML stood for "Yet Another Markup Language," but it has since been renamed to "YAML Ain't Markup Language" to make it clear that unlike SGML and HTML, both of which are languages for documents, it's designed with data in mind. 

It's a text-based format for declarative configuration information or specifications, and for data serialisation (where you convert complex, structured data into a flat-file format you can store or transmit, but still be able to get back to the original structure).

Those are the same kinds of things you'd do with XML but unlike XML or JSON, it's designed to be a format that humans can read and write easily, which is why projects like Ansible picked it over other options. The YAML website is easy to read and it's also a valid YAML code.

How does YAML work?

This programming language borrows features and patterns from a host of others to simplify the process of reading and writing code.

You may use indentations and new lines to structure code so how the code is displayed on your monitor is how it would work, as in Python for example. You’ll be able to select the degrees of indentation you wish to adopt so you can choose whichever you find the most readable, so long as you maintain consistency. You cannot, however, use the tab character, which avoids a major issue that varying operating systems handle tabs differently - in addition to the ongoing spaces versus tabs debate.

Users may also adopt a more compact format where the two main data types, lists and associative arrays (also known as maps) are denoted by the [] and) {} figures. This makes it effectively a superset of JSON, although this is outlined for machines, not humans, to read. Incidentally, YAML also has features that are absent from JSON, including comments, which JSON hasn’t been created to support. There are, however, workarounds.

These data types may also be nested to represent more complicated structures based on those present in Perl. Features are lifted from C, HTML, MIME, as well as mail headers, with colons used to denote key: value pairs.

The space function is present so users won’t have to put quotation marks around strings and numbers. Simple types such as integers, floats and Booleans are detected by default, and there’s priced-in support for ISO-formatted dates and times, although you can also declare your own data types.

Structures let you store multiple documents in a single file or refer to content in one part of the document from elsewhere using an anchor (which also lets you duplicate or inherent properties).

That means it's much more flexible than JSON where the hierarchy is fixed, with each child node having only one parent node and while there's a similar option in XML the YAML parse automatically expands the references. That way you get a file that's easier to read and you avoid potential errors copying and pasting parameters where only a handful of things change between different instances, but external systems don't need to be told about the structure of the YAML file.

What are the benefits of YAML?

Because the formatting is straightforward and you don't have to worry about closing tags, brackets or quote marks, you can edit YAML in simple text editing tools, and subsections of YAML files are often valid YAML. But there are also plugins to add YAML support to common IDEs like Visual Studio Code and Atom; these can use the YAML Language Server provide autocomplete and Intellisense, and there are several YAML linters to check code for correctness.

You can't write YAML that validates itself the way XML documents can do, based on schema, but if you need to define a schema for your YAML there are languages that let you do that. The combination of YAML and JSON Schema can be powerful: VS Code, the DocFX static web site generator and even the schema for Microsoft's Q# Quantum Chemistry library use them together to achieve a more human-readable version of JSON.

Using YAML files has advantages over typing in command line options: you can create much more complex structures in YAML and you don't have to deal with long and unwieldy strings of parameters. And because they're files, you can check them into source control systems, track versions and changes. Because YAML treats lines as information, it works better with git-based systems for tracking changes than JSON. That makes it easier to treat configuration as code that you manage, test and consume the same way you do all your other code.

Are there any downsides to YAML?

YAML has its faults, like all programming languages. Because the language was designed specifically to be straightforward to read and write, and because indentation is functional, mistakes are easy to make, and changes to your YAML code can be made by simple errors like adding an extra space. Longer files may also get complex, and it would be difficult to find typos in YAML code. A typing error, for instance, could mean your code might be functionally correct but may not perform as you intend it to. Using a linter won’t help, either, as it’s a declarative language, so the concept of stepping through code, or setting breakpoints to debug, doesn’t apply.

Since YAML is more readable than JSON or XML, reading through a YAML file will more likely lead to finding errors than other languages. With the kind of configuration you perform in YAML becoming more relevant to the adoption of DevOps, configurations you’re specifying may become more complex and may demand more expertise - regardless of language you’re writing them in. Arguably, there are better languages, such as TOML, but these haven’t been adopted as widely, so YAML is the language more developers will face.

Higher-grade tools will always be easier to work with versus reading and writing YAML files, and there’s a swelling choice of those for Kubernetes. This choice of tools ranges such as Helm, that streamlines installing and managing Kubernetes apps to managed cloud services, to the kubectl command line. Tools such as Pulumi, that use familiar programming languages like JavaScript or PowerShell, also fall into this camp. YAML is a configuration format, however, used in so many widely-used tool and projects that it’s worth familiarising yourself with it, and understanding its benefits and drawbacks.

Featured Resources

Managing security risk and compliance in a challenging landscape

How key technology partners grow with your organisation

Download now

Evaluate your order-to-cash process

15 recommended metrics to benchmark your O2C operations

Download now

AI 360: Hold, fold, or double down?

How AI can benefit your business

Download now

Getting started with Azure Red Hat OpenShift

A developer’s guide to improving application building and deployment capabilities

Download now

Most Popular

IT retailer faces €10.4m GDPR fine for employee surveillance
General Data Protection Regulation (GDPR)

IT retailer faces €10.4m GDPR fine for employee surveillance

18 Jan 2021
Citrix buys Slack competitor Wrike in record $2.25bn deal

Citrix buys Slack competitor Wrike in record $2.25bn deal

19 Jan 2021
Should IT departments call time on WhatsApp?

Should IT departments call time on WhatsApp?

15 Jan 2021