API docs with Apiary

Dale Humby bio photo By Dale Humby

In this post I explain how we’ve taken a documentation first approach to our http/REST API, and the pipeline that we built so that we have strong contracts between our services.

About a year ago at Nomanini we needed to expose some of our internal services to an important client. We were also transitioning from dashboards rendered server-side with Django to an AngularJS web app that had to pull data from our backend. We exposed a public API that our clients, web app and tooling could use.

Based largely on the success of this we are migrating to microservices, with each service exposing REST API’s to other services as well as external API consumers like our web app (not ReactJS), Android app, internal tooling and our clients who wish to write their own tools or apps.

What makes a good API?

My key requirements for good API’s are:

HTTP

  • Use standard http verbs correctly, and return sensible http response codes.

RESTful

  • Difficult to get right, and I certainly don’t claim that our API is totally RESTful, but that’s the goal.
  • We did not attempt HATEOAS

JSON

  • We already use JSON everywhere within our system. It’s more light weight than XML, easier to read, compresses well and felt less opaque than e.g. protocol buffers.

Useful error messages, that include

  • an error code that programs can use to determine the sort of error, and affect program flow depending on the code. These supplement and are more fine grained than the http error codes. We decided on a string errors (e.g. id_not_found) instead of error numbers (e.g. 42) which could be confused with http error codes (404).
  • human readable error message that tells a developer what went wrong and importantly give direction on how they might fix the error.
  • a direct link to the relevant API documentation.
  • a unique trace ID that our support team can use to find the exact log entry if someone emails us for support.

Documents

  • that are the specification for the implementation (inspired by the Test Driven Development workflow), and not something to slog through after the API is written “because management says so.”
  • that are easy to read, and give developers context about the endpoints and tell them when and why they should use a particular endpoint, what to send and what to expect back.
  • useful examples that show real world use cases.
  • a schema, for validation of incoming and outgoing packets
    • There was one case where we wrote a spec (before all of this), and had two developers, one writing the client, the other writing the server, and both interpreted the design doc differently and differently from each other. It took an entire day of unnecessary back-and-forward emails and chat to track down the bug.
  • the examples in the docs should all be tested
    • against the schema to throw out any obvious errors.
    • against a server to check our implementation against the docs.
  • shouldn’t be open to interpretation or require guessing. All the context, background and useful information should be in the docs.

Methods we tried for writing API docs

Restructured text

Free form docs, with rendered html. RST is designed for writing docs, not specifically for API’s. This gave space for context and examples, but we had to design our own documentation format (the look and feel.) Writing examples was easy enough, but often the examples were out of date compared to the actual endpoint that was implemented on the backend.

What we required was a tool chain to test the examples against the endpoint to ensure that what was implemented complied with the docs (which we treat as the specification/contract.)

Swagger

Next I experimented with Swagger. The focus seems to be on experimenting with endpoints: In many sample API’s I could see the JSON and parameters, but the meaning of parameters was almost never documented, nor the context of when or why I should use specific endpoints. That’s not to say it cannot be done, but many the examples I found online did not have the ‘why’ context.

I also found it difficult writing docs in YAML.

Apiary’s API Blueprint

Designed for collaborative online API document editing. I think they still have a long way to go until they have the Google Docs equivalent for API’s.

API Blueprints are written in Markdown, using Apiary’s specific format. There are just enough constraints so that all the docs look the same: there’s place for URL’s, parameters, body, headers, responses, schemas and have place for providing context through written paragraphs and examples - something which I couldn’t figure out with Swagger (which I felt had too big a focus on the ‘how I should use this endpoint’ and not enough on ‘why I should use this endpoint.’)

The Apiary online service has an online editor that parses the Blueprint markdown, renders it in to html, hosts the docs for you, and makes example servers available for testing and exploration.

They also have an extensive open source tool chain:

Drafter parses the API Blueprint markdown and outputs a JSON document with a specific schema. This turns your markdown API docs in to a machine readable (JSON) version.

Agilo renders the markdown docs in to htmls, which can then be hosted as a static website.

The downside of API Blueprints

API Blueprint is still in a state of flux with new features often added, which sometimes breaks backward compatibility.

They also assume that your whole API is in one (huge!) file, which makes me think they’ve never written a large API before. Our compromise is to have a documentation folder for each service, and split each collection of endpoints in to its own file. This has the downside that models defined in one file cannot be referenced in another file. I’ve not found this to be too much of a problem, provided that we have the correct boundaries between services and collection of endpoints within a service.

The example docs that Apiary hosts as references are trivial as far as large application API’s go. Once you want to get something slightly more complex you’re on your own, and have to wade through the Blueprint spec, which has a steep learning curve just to understand all the concepts.

That aside, once you understand the patterns and have a few endpoints written, it’s a lot easier. And then other people on the team can view your docs and use those as patterns for the next endpoints.

JSON Schema

As part of the API Blueprint you can define a JSON Schema for requests and responses. We use this to enforce strong API contracts. The JSON that we accept in POST’s, and return in responses is validated against the relevant schema at run time.

As part of our CI process we validate the examples that we wrote in the docs against the JSON Schema to make sure our examples are correct against their schema.

Our solution: API Blueprints + JSON Schema

Apiary has a good, open source tool chain, which others have built tools around. (Like Aglio for rendering html versions of the docs.)

We don’t use the online Apiary service at all - only their open source tools. (Sorry Apiary…)

API documentation workflow in CI

Input: API Blueprint money.apib

Output:

API Blueprints are written by developers (in the Apiary Markdown format) before we implement the endpoint. They act as the specification. All docs include example JSON for requests and responses and the JSON Schema for the requests and responses.

The API Markdown files are committed along with the rest of the services’ code.

Custom tools

Running Drafter takes the Markdown files and converts to the Apiary JSON format for use with our tools.

One of our tools pulls out all of the JSON examples and the JSON Schemas, and tests that

  1. our schemas are valid against the JSON Schema Meta-schema, and
  2. that our own examples are valid JSON and comply with the their respective schema.

We also check the docs for the pedantic stuff, like enforcing sentence case for all headings and spell checking.

The JSON examples are extracted for our mocked-out backend server.

All the JSON Schemas, the URL’s, URL parameters and their type (integer, string, etc) and whether they are required or not, and permissions required are written to another JSON file (our own format) that is deployed with our server code and used for runtime validation of incoming requests, and outgoing responses.

Static website hosts the docs

The Apiary files are run through Aglio which renders the html version of the docs. This becomes our static documentation website, hosted on Google Could Store.

In the CI process these artifacts are added to the other artifacts generated in the build and test cycle, ready for deployment in to our CI, staging and production environments.

Request and response validation at runtime

Within our application we validate all incoming requests and outgoing responses against our schemas.

  • Incoming GET requests to make sure only the specified parameters are in the URL, and the parameters match the correct type
  • The body of a POST validates against the schema
  • The body of our response validates against the response schema
  • The required user permission for this endpoint are met

Validating the outgoing body ensures that we are sticking to our own spec. (Implementation sometimes differs from the docs and this catches those problems early.)

After validation

  • URL parameters are cast in to their correct Python types
  • Datetimes are converted to UTC
  • Incoming JSON (in a POST) is cast in to a Python dictionary

After this the object is handed off to the specific Flask handler for that endpoint.

This vastly simplifies application code because the handler does not have to care about whether info that it requires might be missing or invalid. The handler is only called if validation passes all checks.

Errors and error messages

If validation fails we generate a nice error message that developers can use to debug their client application.

The correct http response code is returned, reusing the existing http error codes instead of making up our own.

The http body contains a detailed error message:

Status-Code: 400 Bad Request

{
  "errors": [
    {
      "code": "no_data",
      "developer_message": "There is no data for the given date range, try a larger date range.",
      "more_info": "http://info.nomanini.com/money.html#summaries-transactionsummary-get",
      "request_id": "54c3c61900ff0d1d98752fb3160001737e6e6f6d616e696e692d64617368626f6172640001353430332d63303965616534666631383400010101"
    }
  ]
}

Error messages contain:

  • a standard http response code, sticking closely to RFC7231.
  • a more detailed error code (over and above the http response code) that programs can use to determine the sort of error, and affect program flow. We decided on words as error codes instead of opaque numbers.
  • a human readable error message that tells you what went wrong and has some hints how to fix the problem. We often use the error messages returned by the Python jsonschema library.
  • a link to the docs for easy reference. Because the JSON schema is used for validation and that is pulled from the docs we know where in the docs the schema came from.
  • a unique request ID that links directly to that calls log entry. Request ID’s are generated by our API gateway and sent to all downstream services and used in all logging. We return the request ID to make searching logs easier.

API endpoint permissions

At present, API Blueprints don’t have any method to communicate permissions required for endpoints. We’ve extended the API Blueprint format with our own formatting to include which permissions are required to access each endpoint.

For example, in our Money API, the docs that explain List Accounts specify that the permission self or money:view is required to access the endpoint. In this case, self means that you can get a list of your own accounts, and someone (like an administrator) with the money:view permission can get a list of your accounts.

At the end of the the description section of the endpoint we add a permissions section.

### List accounts [GET]
List the accounts on the system.

**Permissions:** self, money:view

As part of the CI build pipeline, permissions are extracted from the API Blueprint JSON file and written to their own JSON file for use by the application to validate if users are allowed to make calls to that API endpoint.

Postman Collections

We are huge fans of Postman for poking around API’s. For each of our services we also transcode the JSON file generated by Drafter in to a Postman collection. I wrote a hacky script to do the transform from API Blueprint JSON to Postman JSON Collection.

Putting it all together

All of this work means that developers who are using Nomanini’s API’s have

  • Good html documentation hosted on a publicly accessible site
  • Postman to play with the API (against our dev server)
  • JSON Schemas that catch errors
  • Required permissions shown in the docs
  • And automatically generated, helpful error messages

By writing API documentation first we have seen a significant improvement in our API design and faster time to implement clients against our API’s. This is largely driven by fewer questions to our backend team, less ambiguity and fewer bugs in production.

The future

There is no way (yet) to version API endpoints in API Blueprints. I’m hoping they’ll add in ways to handle that, as well as in what state an endpoint is in (documented, alpha, beta, stable, deprecated) so that API consumers have some idea of the stability of the endpoint.

Native handling of permissions would be useful.

We’d like to expand our toolchain to make the API endpoint schemas available to developers for use within their own dev CI environments. That way they can test their clients in CI without having to run integration tests against a remote test server.

As we grown it would be nice to automatically generate client libraries from the docs. This is something that Swagger does well. We could use an API Blueprint to Swagger transcoder like apib2swagger, and then use Swagger to generate client libraries.

Conclusion

Writing good documentation, and then using it within the production system for unit testing and server mocks, for parameter and request and response validation and user access control has reduced the load on developers significantly, especially as the number of internal development teams and services grows.

With this toolchain we are able to develop API endpoints faster, and support more client integrations in to our API’s with minimal support load on our development teams.

Tools we use

Further reading