YAML is an underrated markup language. JSON is far more common these days for configuration management. This is probably due to the fact that it's the defacto standard for JavaScript... However, that doesn't explain why all other languages also choose it when YAML is, well, just plain better.

The purpose of this article is to highlight and demonstrate some of the features you may not have known beyond the basic syntax. If you have never seen YAML before, there is a sample document on Wikipedia.

1. YAML is a Superset of JSON

Yes, that's right. You could paste your JSON directly into a YAML file and it would resolve exactly the same through YAML parsers. JSON can be embedded anywhere inside YAML. For example:

foo:
bar: {"baz": [123, 456]}

Would be the same as writing:

foo:
bar:
baz:
- 123
- 456

Using JSON in place is often useful for a few reasons:

  1. You're building a configuration file from externally provided JSON. So there's no need to convert the JSON you receive.
  2. A JSON object might represent a complex type that should be reduced to one line; like a position: {"x": 10, "y": 20, "z": 15}.
  3. You're lazy and would just rather type JSON.

2. Multidocuments

YAML supports multiple documents in a single file. These are separated by :

---
time: 20:03:20
player: Sammy Sosa
action: strike (miss)
---
time: 20:03:47
player: Sammy Sosa
action: grand slam

3. Comments

Some JSON parsers support embedded comments by using JavaScript comments. However, the vast majority of them do not since it's not part of the JSON standard.

All YAML parsers support comments with a # which is invaluable when you need to explain in the configuration itself (the most appropriate place to put this documentation):

dev:
# We have to use 1234 on development because...
redis_options: {"port": 1234}

Comments do not have to be placed on their own line:

message:
subject: Hi # The message title

4. Complex Keys

Traditionally your keys are limited to anything that can be represented as a string. YAML let's you create complex keys by using any data type as the key itself:

? - Detroit Tigers
- Chicago cubs
:
- 2001-07-23

This is not a shorthand for writing multiple keys that have the same value. The key itself is an array. Another example using a different syntax:

? [ New York Yankees,
Atlanta Braves ]
: [ 2001-07-02, 2001-08-12,
2001-08-14 ]

5. Multiline Text

Multiline text, especially when it needs to retain whitespace can be particularly tricky to maintain. YAML provides several different syntaxes for multiline text.

name: Mark McGwire
accomplishment: >
Mark set a major league
home run record in 1998.
stats: |
65 Home Runs
0.278 Batting Average

  • > will convert newlines into spaces and ignores indenting so that the value of accomplishment is Mark set a major league home run record in 1998.
  • | maintains newlines, but still removes indentation from the block, so the value of stats is 5 Home Runs\n0.278 Batting Average
  • |- works like | but does not append the last newline.

6. More Data Types

null, numbers, strings, arrays and maps are supported just like JSON but YAML is also able to recognise other language specific types such as dates and timestamps:

canonical: 2001-12-15T02:59:43.1Z
iso8601: 2001-12-14t21:59:43.10-05:00
spaced: 2001-12-14 21:59:43.10 -5
date: 2002-12-14

And special numerical values:

negative infinity: -.inf
not a number: .NaN

Even more application specific data types can be supported with Tags below.

7. Unicode

The YAML language itself may not support unicode characters,but there are safe ways to escape this data in various ways:

unicode: "Sosa did fine.\u263A"
control: "\b1998\t1999\t2000\n"
hex esc: "\x0d\x0a is \r\n"

8. Variables

YAML isn't just static, you can define variables to be used elsewhere. Prefix the value with a & to define a variable and later reference it:

default: &DEFAULT
database_name: master

development:
<<: *DEFAULT
database_user: dev

production:
<<: *DEFAULT
database_user: live

9. Tags

The simplicity of YAML means you don't have to specify what the data types of each value is. Alternatively you can use tags (prefixed with !!) to specify the type of value:

not-date: !!str 2002-04-28

This can be extended to your own custom types which makes it extremely powerful for serialising the type of an object along with the values that represent it:

%TAG ! tag:clarkevans.com,2002:
--- !shape
# Use the ! handle for presenting
# tag:clarkevans.com,2002:circle
- !circle
center: &ORIGIN {x: 73, y: 129}
radius: 7
- !line
start: *ORIGIN
finish: { x: 89, y: 102 }
- !label
start: *ORIGIN
color: 0xFFEEBB
text: Pretty vector drawing.

If the host language and parser support it, there are some known tags to represent special data types:

# Sets are represented as a mapping where each key is
# associated with a null value
--- !!set
? Mark McGwire
? Sammy Sosa
? Ken Griff

# Ordered maps are represented as a sequence of
# mappings, with each mapping having one key
--- !!omap
- Mark McGwire: 65
- Sammy Sosa: 63
- Ken Griffy: 58

10. Better for Version Control

The style of YAML natively makes it perfect for version controlling since nested elements are represented by increasing indentation. Diffs generated from YAML changes are very clear and not subject to differing code styles.

What About the Disadvantages?

I think the most obvious disadvantage is that software has decided to use JSON for configuration since it just so much more widely known - which should not be confused with "the best solution".

YAML does not make a good interchange format. For example, from an API, because it is sensitive to spacing. But it still is excellent for configuration.

If you're writing an application I would urge you to use YAML. Don't even get me started on XML configuration.


Thank you for reading. I'd really appreciate any and all feedback, please leave your comments below or consider subscribing. Happy coding.


comments powered by Disqus