Tips for designing a JSON Schema

I've been writing a lot of JSON schemas lately.
Some were just small API requests and responses.
Some were big all encompassing documents that involves over 20 unique object types.

When I started the big ones, I went looking around the internet for some resources. What I found was plenty on how to write a JSON Schema (This being the best one), but not on how to design one.
To contrast, there's plenty of material available on how to design relational database schemas - from blog articles to university subjects.
I still managed to come up with something resembling a process, so I thought I'd share with the class in the hopes it'll help someone else.

Just a quick disclaimer: These are just my own personal findings and opinions.
If you have your own findings that you think are better, by all means stick to them. Share them with the class too, if you can.

Read the 'JSON API' standard

I cannot emphasise this point enough.
Go to http://jsonapi.org/ and read through the documentation and examples.
I'm not expecting you to implement it - especially if you're not creating JSON Schemas for a web API - but it's great example of a JSON based standard, and well worth the time to look at. I've based a lot of my own schemas off elements of it, without implementing it completely.
Things like using the "id", "type", and "links" keywords.

If you do decide to implement it, take into account that it's a work in progress.
In the last month they've published a release candidate which had some significant differences from the previously published versions. But the fact that they're calling it a "release candidate" says it should be pretty stable by now.

Keep a flat structure

Don't embed objects within objects within objects within objects.
You're better off keeping a flat structure of one or many arrays at the top level containing objects.
Rather than nesting one object inside another, create a separate object in one of those arrays and link the two together - even if you're sure it's a 1-to-1 relationship for both objects.
The JSON API uses a single "included" array that contains all additional objects. I personally prefer having multiple arrays that group the objects by "type" - but that's me.

You can also use the same prefix for property names, rather than creating a new embedded object.
eg. {"name": {"first": "John", "last": "Smith"}} becomes {"nameFirst": "John", "nameLast": "Smith"}

Why?

  • It means less code like this:
    var value = data.article && data.article.author && data.article.author.phone && data.article.author.phone.number;
  • It simplifies things if you're marshalling and unmarshalling your JSON into and out of other data structure. eg. Classes and SQL tables.
    Even if you're using a NoSQL DB with NodeJS today - tomorrow you might create a new microservice in something new. Doesn't hurt to be flexible.
  • You have a 1-to-1 link today, but a new feature tomorrow might change that.
    Keeping things flat give you some extra flexibility for the future.
{
  "articles": [{
    "id": "article1",
    "links": {
      "author": {"type": "people", "id": "person1"}
    }
  }],
  "people": [{
    "id": "person1",
    "phoneNumber": "1234"
  }]
}

Work out how relationships are defined

Even if you don't go with a flat structure, you're going to need some way to say "X relates to Y because Z" - unless you honestly plan to duplicate and embed each and every object.
Take a look at how JSON API does it - the way the "links" property contains relationships, and each of those has a "linkage" object(s) with the "id" and "type" of the other object(s).
You don't need to do the exact same thing. But you should be consistent throughout your entire schema.
Preferably throughout all the schemas in your organisation, if you can manage it.

Why?

Because if you have a consistent format for defining relationship, you can write functions that will traverse those relationships for you.

/*
  For this schema, each object's relationships are defined as an object under
  a "links" property.
  Each property of that object is the name of the relationship, and the value
  is  a single object with the "type" and "id" of the related object.
  For "to-many" relationships, the value can be an array of the same objects.
*/
var data = {
  "articles": [{
    "id": "article1",
    "links": {
      "author": {"id": "person1", "type": "people"}
    }
  }],
  "people": [{
    "id": "person1",
    "phoneNumber": "1234"
  }]
};

function getLink(data, object, linkName) {
  var link = object.links[linkName];
  if (link && data[link.type]) {
    return data[link.type].find(function(e) {return e.id === link.type;});
  }
}

getLink(data, data.articles[0], 'author').phoneNumber; // "1234"

New properties are cheap

Sometimes there's the temptation to reuse the same property for a slightly different purpose.
Maybe you want to define an "accountType" which then gives a completely different context to the 5 other properties, and you think "I just saved myself from writing 5 new properties into the schema, for the price of 1 - woooo!"

Don't do that.

Of all the possible schema changes you could make - new properties are the cheapest.
Don't make things more confusing by trying to shove square pegs into round holes.
Unless you are writing super memory efficient software, where you need to squeeze every last bit from the hardware (in which case, why are you using JSON at all?) - just make a new property.

Prepare for change

No matter how hard you try, how long you work, how much you analyse and survey and workshop - there's going to be something you need to change in the structure of the schema.
Now you can either live in constant fear of this day, or you can work out your strategy ahead of time.

Embed the version of the schema in your JSON documents and have a process in place for running migration scripts.
Whether your strategy is a 'big-bang' during downtime or an ongoing background process.
Just have your migration process ready.

Use "allOf" to create schema mixins

I said this wasn't about how to write a JSON Schema, just how to design it - this is the exception.

I gave the example earlier of prefixing property names rather than nesting objects - 'name' to 'firstName' and 'lastName'.
But what if you have 5 different objects that need to have a 'name'?
Am I supposed to duplicate that in the schema 5 times too?

No.

The JSON Schema standard defines a number of keywords that can be used to combine schemas: "oneOf", "allOf", and "anyOf".

  • "oneOf" is good for applying "switch" logic branching.
  • "anyOf" is for when you're a lot more forgiving.
  • "allOf" is great for constructing a schema from sub-schemas.

For example you can create a sub-schema for an object with "nameFirst" and "nameLast" properties, and include it as part of the schema for a completely different object.

{
  "definitions": {
    "nameMixin": {
      "type": "object",
      "properties": {
        "nameFirst": {"type": "string"},
        "nameLast": {"type": "string"}
      },
      "required": ["nameFirst", "nameLast"]
    },
    "person": {
      "allOf": [
        {"$ref": "/definitions/nameMixin"},
        {
          "type": "object",
          "properties": {
            "age": {"type": "number"}
          },
          "required": ["age"]
        }
      ]
    }
  }
}

Cheers,
Jason Stone

(Originally posted 28 Mar 2015 at legacytotheedge.blogspot.com.au)