YACYAML

Posted on Jun 7, 2012

I posted YACYAML, the Cocoa YAML parser/object archiver I’ve been working on for a little while to GitHub today. It converts Cocoa objects to and from YAML, a plain text, human friendly data serialization format.

YACYAML can be used in lots of ways - from replacing plists or JSON for simple config files, up to storing entire custom documents in an easy to view (and easy to hand-edit) format.

Why have I made this?

The short answer is that I like YAML. It’s ‘nicer’ than plists and JSON to edit, and it’s far, far, nicer to look at than NSKeyedArchiver’s binary representation (which, to be fair, wasn’t meant to be human-readable in the first place, but I think it would be good if it was).

The emotional appeal

The story behind this is that I was writing yet another configuration file for an iOS app. This one was to configure how CSS generic font families should map to specific on-device fonts in Eucalyptus 2’s UI, given a user’s specific font selection. For example, and simplifying a bit, if the user’s chosen “Baskerville”, the renderer should map “serif” to “Baskerville”, but leave “fixed” things in “Courier”.

This could obviously be specified with a configuration plist:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Baskerville</key>
    <dict>
        <key>serif</key>
        <string>Baskerville</key>
        <key>fixed</key>
        <string>Courier</string>
…

I really dislike plists though. They’re verbose and hard to edit (and don’t get me started on how they [ab]use XML’s nesting…). The thought of yet another one plist the app saddened me. It would be easier to just hard code this in ObjC.

The obvious solution nowadays is to use JSON:

{ 'Baskerville': { 'serif': 'Baskerville', 'fixed': 'Courier'…

Much, much easier to edit, and much less verbose. If it wasn’t for the fact that I’d used YAML before, perhaps I would have stopped there and used JSON.

I have used YAML before though, so I know things can be better. For all its brevity and ease of editing, JSON still feels like it’s written for a computer. Here’s a YAML representation of the same thing:

Baskerville: serif: Baskerville fixed: Courier …

This is obviously much more human-friendly than plists, and, I would say, than JSON. YAML is, in the words of yaml.org, “A human friendly data serialization standard”, and I think that’s a good description.

You might have noticed there that I said that was a YAML representation of the same thing. I could also have written:

{ 'Baskerville': { 'serif': 'Baskerville', 'fixed': 'Courier'…

Yes - that’s the same as the JSON. YAML is flexible, and is in fact a superset of JSON (through serendipity - YAML came first - although the YAML 1.2 spec solidifies JSON as a subset specifically).

When the creators of YAML had a choice they always tried to specify what would be easier to humans, not to machines. In many ways, YAML looks just like a plain text document any human might intuitively sit down and write as a list or outline. To put it the other way around, any plain text document any human might intuitively sit down and write as a list or outline is probably at least pretty close to being valid YAML.

YAML is strongly typed - though you might not guess so from a simple document. Sequences must be decoded as arrays. Mappings must be decoded as dictionaries. Numbers as number objects or literals. Boolean values to a boolean objects or literals. How would you represent a number as a string? In the most obvious way - put quotes around it.

Red flags are perhaps going up about now. You probably think a data representation language that’s designed for humans is a crazy idea; its very flexibility would paradoxically make it hard to edit, and hard to understand. I did too initially. You’d be wrong though. YAML is unambiguous - a rigorous spec means that for any YAML document, there’s one way to parse it - which I think is the property that saves it in this regard.

All this does make it quite complex to parse, yes (this is perhaps the major criticism of YAML), but let’s be honest: in these days where we’re all carrying around supercomputers in our pockets, “quite complex” is a very relative term.

In some ways, I would say YAML is to data representation what Markdown is to text markup (a strange comparison for me to make, since I’m not, to be honest, Markdown’s biggest fan, but it feels true nonetheless).

As an aside, I do find it curious that something that’s easier for humans to ‘parse’ would be harder for computers to parse. Isn’t the human mind mysterious?

The Technical Appeal

There are three different native ways to archive a Cocoa object nowadays. For simple types, we use plists or JSON. For complex types, we use NSKeyedArchives.

Why are there three representations? Plists and JSON are both simple, ‘straight-line’ representations. There’s no way of encoding repeated copies of the same object, or cycles in the object graph; they can only represent trees. They also can’t encode types that are not native ‘plist types’ or JSON types. Because of these two issues, they fundamentally can’t natively represent arbitrary encoded objects.

NSKeyedArchives, are richer. They can represent arbitrary ObjC classes. They can also encode that the same object is used in different places in the tree, or even in cycles (with some caveats that mean that decoding cycles is quite complex, and, of course, potentially fraught with retain-cycle peril).

If you’re aware of how NSKeyedArchives are implemented, you might argue that I’m wrong, that there’s only one representation - plists. You’d be sort of right; NSKeyedArchives are actually encoded as binary plists, but there’s lots going on in there that isn’t ‘native’ plist, but instead an encoding mapped on to it. Saying that NSKeyedArchives are just plists is like saying that a plist is just UTF-8 text - it’s ‘true’, but it’s not true.

So, enter YAML. As you’ve seen above, simple YAML looks like a simply typed, straight-line representation when you read it - like JSON with a richer syntax. YAML does, though, also have the ability to encode type information explicitly. This type information can specify arbitrary application-defined types. It also has the ability to define ‘anchors’ in a document, and to refer to these anchors as ‘aliases’ later in the document, enabling it to encode arbitrary object graphs.

Incidentally, anchors can be useful in hand-written YAML too. For example, say you had some complex option, string, or regex that you wanted to use throughout a configuration file. With plists or JSON you’d be entering the entire thing every time. With YAML, you’d give its first use an anchor, and then just refer to it later.

So, with the ability to encode types and object graphs, we have the capacity to archive arbitrary Cocoa objects. What’s more, Apple’s already given us an API to do it - the NSCoder protocol, as used by NSKeyedArchiver, and implemented in many classes already.

YACYAML implements NSCoder, as YACYAMLKeyedArchiver and YACYAMLKeyedUnarchiver, and uses that to implement YACYAMLEncodedString and YACYAMLDecode convenience methods on NSObject and NSString (and NSData ).

To illustrate encoding of arbitrary objects, Here’s a YAML representation of a Mac OS X NSButton. Note that I didn’t implement anything special to get this. NSButton already implements NSCoder, so this is just the output of [YACYAMLKeyedArchiver archivedStringWithRootObject:myButton] on a regular NSButton.

&a !NSButton NSNextResponder: NSvFlags: 256 NSFrame: '{{20, 20}, {400, 50}}' NSTag: -1 NSEnabled: y NSCell: !NSButtonCell NSCellFlags: 67239424 NSCellFlags2: 134217728 NSContents: If you wanna come to my house, then click me with your mouse. NSSupport: &b !NSFont NSName: LucidaGrande NSSize: 12 NSfFlags: 4880 NSControlView: *a NSButtonFlags: -1232977665 NSButtonFlags2: 2 NSAlternateImage: *b NSKeyEquivalent: '' NSPeriodicDelay: 400 NSPeriodicInterval: 75

In Conclusion

So, there we have it. YACYAML. An excursion prompted by the need to write a small config file that’s resulted in an new Cocoa serialisation format. I’m pleased with the result though. I hope you are too.

If you’re interested in learning more about YAML, Wikipedia’s YAML entry is pretty good, and the official spec can be found at yaml.org. YACYAML is, as I mentioned, on GitHub, and the YACYAML ReadMe there is a more technical counterpart to this blog post.