What is YAML?

We look at the pros and cons of a language now in its 18th year

The increasing popularity of Kubernetes means you've probably heard of YAML because it's the format for Kubernetes configuration files, so almost every developer may need to get some familiarity with it.

But while it's not as ubiquitous as JSON, YAML goes far beyond Kubernetes; first released in 2001, it's used in tools from OpenStack to Ansible playbooks.

Originally YAML stood for Yet Another Markup Language; it was renamed to read YAML Ain't Markup Language to make it clear that unlike SGML and HTML that are languages for documents, it's designed for data.

Advertisement - Article continues below

It's a text-based format for declarative configuration information or specifications, and for data serialisation (where you convert complex, structured data into a flat file format you can store or transmit, but still be able to get back to the original structure).

Those are the same kinds of things you'd do with XML but unlike XML or JSON, it's designed to be a format that humans can read and write easily, which is why projects like Ansible picked it over other options. The YAML website is easy to read and it's also valid YAML code.

How does YAML work?

YAML also adopts useful features and patterns from a range of other programming languages to make it easier to read and write.

Advertisement - Article continues below

Like Python, you use indentations and new lines to structure code, so the way the code is laid out on screen is exactly how it works. You can choose what indentation levels you want to use so you can pick whatever you find the most readable, as long as you're consistent. But you can't use the tab character, which avoids problems of different operating systems handling tabs differently (as well as lengthy debates about whether tabs or spaces are better).

Advertisement - Article continues below

You can also use a more compact format where the two key data types, lists and associative arrays (known as maps) are marked by [] and) {}, making it a superset of JSON but JSON is designed for machines to read not humans. (YAML also has features that are missing from JSON; not least comments, which JSON isn't designed to support, although there are ways to work around that.)

Those data types can be nested to represent more complex structures and are based on those in Perl. Other features are borrowed from C, HTML, MIME and even mail headers, with colons used to mark key: value pairs.

The space is there so that you don't have to put quotes around strings and numbers. Simple types like integers, floats and Booleans are automatically detected, and there's built-in support for ISO-formatted dates and times, but you can also declare your own data types.

Advertisement - Article continues below

Structures lets you store multiple documents in a single file or refer to content in one part of the document from elsewhere using an anchor (which also lets you duplicate or inherent properties).

That means it's much more flexible than JSON where the hierarchy is fixed, with each child node having only one parent node and while there's a similar option in XML the YAML parse automatically expands the references. That way you get a file that's easier to read and you avoid potential errors copying and pasting parameters where only a handful of things change between different instances, but external systems don't need to be told about the structure of the YAML file.

What are the benefits of YAML?

Because the formatting is straightforward and you don't have to worry about closing tags, brackets or quote marks, you can edit YAML in simple text editing tools and subsections of YAML files are often valid YAML. But there are also plugins to add YAML support to common IDEs like Visual Studio Code and Atom; these can use the YAML Language Server provide autocomplete and Intellisense, and there are several YAML linters to check code for correctness.

Advertisement - Article continues below
Advertisement - Article continues below

You can't write YAML that validates itself the way XML documents can do, based on schema, but if you need to define a schema for your YAML there are languages that let you do that. The combination of YAML and JSON Schema can be powerful: VS Code, the DocFX static web site generator and even the schema for Microsoft's Q# Quantum Chemistry library use them together to achieve a more human-readable version of JSON.

Using YAML files has advantages over typing in command line options: you can create much more complex structures in YAML and you don't have to deal with long and unwieldy strings of parameters. And because they're files, you can check them in to source control systems, track versions and changes. Because YAML treats lines as information, it works better with git-based systems for tracking changes than JSON. That makes it easier to treat configuration as code that you manage, test and consume the same way you do all your other code.

Are there any downsides?

Like any language, YAML has its faults and its detractors. It was explicitly designed to be simple and straightforward to read and write, but because indentation is functional, it's easy to make a mistake and change what your YAML code does by adding or missing a space. Long YAML files quickly get complex and typos can be hard to find; if a typing error means your code is functionally correct but doesn't do what you want a linter won't help and it's a declarative language so there's no concept of stepping through' code or setting breakpoints to debug it.

Advertisement - Article continues below

But because it's much more readable than JSON or XML, you're more likely to be able to spot what's wrong by reading through a YAML file. And the problem of complexity is a sign of underlying changes in IT rather than flaws in YAML itself. As the kind of configuration you do in YAML becomes more central with the adoption of devops, the configurations you're specifying are going to become more complex and demand more expertise, whatever language you're writing them in. There are arguable better languages like TOML but they haven't been adopted widely, so YAML is what an increasing number of developers will be faced with.

Higher level tools are always going to be easier to work with than reading and writing YAML files by hand and there's an ever-growing selection of those for Kubernetes, from the kubectl command line to tools like Helm that streamline installing and managing Kubernetes apps to managed cloud services like Azure Kubernetes Service and tools like Pulumi that use familiar programming languages like JavaScript or PowerShell. But YAML is the configuration format for some many popular tools and projects that it's worth getting familiar with it and understanding what it's good at as well as its quirks.

Featured Resources

Successful digital transformations are future ready - now

Research findings identify key ingredients to complete your transformation journey

Download now

Cyber security for accountants

3 ways to protect yourself and your clients online

Download now

The future of database administrators in the era of the autonomous database

Autonomous databases are here. So who needs database administrators anymore?

Download now

The IT expert’s guide to AI and content management

Your guide to the biggest opportunities for IT teams when it comes to AI and content management

Download now

Most Popular

Mobile Phones

Microsoft patents a mobile device with a third screen

6 Apr 2020
video conferencing

Taiwan becomes first country to ban Zoom amid security concerns

8 Apr 2020
cyber security

Microsoft gobbles up corp.com domain to keep it from hackers

8 Apr 2020