Skip to content

API V4 Code Generation

Jason Dagit edited this page Aug 16, 2017 · 3 revisions

API V4 Code Generation

Description: Generate Haskell bindings to the V4 API using the upstream OpenAPI specifications.

Background

The current bindings to Mattermost are hand written through a combination of looking at the available documentation, the Mattermost server code, and inspecting browser activity. This process works and allows us to carefully craft each API function, but it can be labor intensive at times and is error-prone.

Starting with version 4, the Mattermost developers provide a machine readable specification for the HTTP end-points of the Mattermost server[^1]. This specification is defined using a notation known as OpenAPI (previously known as Swagger). The specification can be found here: https://github.com/mattermost/mattermost-api-reference

[^1]: Jason: I'm personally optimistic that their goal is to generate some of their own code from this specification. In other words, I'm hopeful that it will eventually be the "ground truth" for the API. Alas, I haven't been able to find any official statements about this.

Tools already exist for generating code from a given OpenAPI specification. We could use these existing tools to generate our bindings but after a cursory glance at the generated code we came to the conclusion that we would like to have a bit more control over the generated code. For example, the generated code is locked to older versions of Haskell libraries. Therefore, in order to use the generated code we would need to update the existing code generator to use newer versions of libraries. We feel that this maintenance burden will be easier with a custom code generator.

Initial Investigation

OpenAPI documents are written in YAML. We created a repo to explore processing the Mattermost API reference. That repo is located here: https://github.com/matterhorn-chat/mattermost-api-gen

For convenience of processing, I converted the YAML documents into JSON files. JSON should be an equivalent format to YAML[^2] but Haskell has slightly better libraries for working with JSON. The scripts for doing this can be found in the scripts directory. See convert.sh.

[^2]: In fact, at least one popular Haskell library for YAML parsing returns the data as JSON values.

Initially, I tried pulling the data apart using lens operators over the JSON values. I quickly realized this is not a good way to process the document.

Next Steps

The Mattermost API document use lots of domain specific concepts. The right way to process this document is to make Haskell types that correspond to these concepts. This will allow us to separate the parsing step from processing and allow us to write simpler processing code, as it won't have to deal with so many special cases.

Example concepts that we should model in Haskell:

  • Parameters: The OpenAPI document has fields for when a parameter is require and the types of parameters. This information will be useful in generating the correct Haskell code.

  • Responses: The expected HTTP responses and data carried in those responses can be used to generate error handling code.

I haven't counted exactly how many of these concepts we'd want to model, but it shouldn't be more than about 20. Certainly, most end-points only use a handful each.

Once we have this data in a suitable internal representation, most of binding generation should be syntax directed. We may need to have a mechanism for overriding code gen for specific endpoints in some. I haven't seen evidence that we need this, but it's a common thing to need for code generators.

Risks

  1. The OpenAPI document has errors: This has already come up once. The Mattermost team was quite pleased to receive a fix and merged it almost immediately as it was a simple whitespace change. Other, more significant, errors could exist in the API reference. I think the main mitigation is to use our own fork of the API reference, so that we can edit it and choose when to update, while being mindful to upstream fixes ASAP.

  2. Unknowns in processing the OpenAPI document: The initial investigation was largely superficial. There could be lots of issues lurking in the translation process that have not been identified yet.

  3. We may need a shim between the generated code and the API we want to expose: Generating the exact API we want to expose to users may not be easy. Some manual work may be required to massage the generated API.

  4. OpenAPI and/or the Mattermost API reference may change significantly in the future.

Level of Effort

  • Modeling: Initially most of the work will be creating Haskell types to model the API reference and to convert YAML/JSON values into those types. This could easy take a week or two, scaling in proportion to how many types we need to make.

  • Code generation: We should be able to use a package like haskell-src-exts as a quasiquoter so that code generation is templated. This will make it easy to generate syntatically valid Haskell. Not sure how long this step will take, but we'll need some time to write utility code common to each generator and also for each concept we'll need:

    • time to figure out what code we want to generate
    • time to write the generator
    • testing/validation

    I expect most of the effort to go into the first portion, figuring out what to generate.

  • Refactor Matterhorn to use new API: Hard to estimate until we know what the generated API looks like.