Background

At Yipee.io, we recently added team functionality to our modeling tool. Teams provides support for sharing applications and is tied into our pricing, enabling teams to purchase different plans with different sets of associated features and increasing complexity for us. We agreed on the relationships between users, teams, plans and features but it wasn’t necessarily clear up front exactly how the UI would need to navigate the relationships. We might want to find the features defined for a user but, alternatively, we might want to find all users, teams1, and plans that possess a particular feature. At the time, for historical reasons, we had a backend system driven by REST APIs that stored data in MongoDB.

MongoDB is an excellent choice for certain use cases (particularly very high data volumes) but a complicated graph of relations, which can be navigated from multiple endpoints along a wide variety of paths, is not a particularly good fit. We decided to switch our backend to a relational database as we didn’t anticipate ever reaching a volume of data that would make a relational implementation problematic. Now that multiple vendors support RDBMS as a service, the decision to use a relational database has been made even easier. Having had positive experiences with PostgreSQL in the past, we decided to give it a try. Our DaaS vendor supported PostgreSQL in addition to MongoDB, so the switch could be made fairly easily. As it turned out, PostgreSQL was quite possibly the ideal choice for this application. I’ll discuss its unique qualifications later in the post.

Once we had decided on a relational database, a similar thought process ensued around our REST APIs. The document model behind REST is a somewhat poor fit to a relationship graph with properties on the associations. The model we were managing looked like:

We could have mapped out REST endpoints for our immediate operational needs, but we might easily have been chasing use cases for some time as the UI or another API user needed different collections of data. An initial dataset for a user identity needed by the UI was:

  • The user’s id, terms and features
  • Each org1 containing the user with its’ id, name, owner (with id and identity), set of users (with id and identity), set of admins (with id and identity), set of writers (with id and identity), plan elements (with id and name), set of features (with id and name) and the set of features for the plan (with id and name)

The backend API needed a similar but somewhat different dataset to establish a user context.

These sorts of queries make for an odd REST interface. So we decided to move away from REST for our team model and use GraphQL. The rest of this discussion will talk about our experience using GraphQL with PostgreSQL to implement our team model.

1Teams are called orgs in the schema since there are orgs for individuals and we wanted to talk about true team plans separately. Single-person orgs work just like orgs associated with larger teams.

GraphQL

There are many excellent sources for information about GraphQL online. In particular, this site has a wealth of useful material. That said, I’ll give a brief overview here for context. GraphQL is an alternative to REST that allows clients to design their own queries instead of requiring that all requests be defined by the server. This works because GraphQL is schema-based. A GraphQL server defines:

  • a schema which specifies all the meaningful relationships between the elements of a data model
  • a set of queries which are entry points into the schema for retrieving data
  • a set of mutations which are the operations that may be performed on model data

Unlike a REST api, a query entry point does not itself define what data may be retrieved. For example, here is a query from our team model:

userByIdentity(service: String!, identity: String!): User

This says that a client can retrieve a user given a service (e.g. “GitHub”) and an identity (e.g. GitHub id). However, the user is just the beginning. Here are the descriptions of the user and org types in our schema:

type User implements FeatureOwner {
    # The ID of the user
    id: String!
    # What external service authenticates this user?
    service: String!
    # What is the user's id in the external service?
    identity: String!
    # Has the user accepted terms (default: false)?
    terms: Boolean!
    # The user's orgs, if any
    orgs: [Org!]!

    ...
}

type Org implements FeatureOwner {
    # The ID of the org
    id: String!
    # What is this org called
    name: String!
    # This org's plan
    plan: Plan!
    # The owner of the org (point of contact)
    owner: User!
    # The number of users with write privileges
    maxWriters: Int!

    ...
}

An exclamation point (!) means that the item may not be null. The user definition says that a user consists of three string values, one boolean, and three lists containing instances of associated types (similar for an organization). A GraphQL schema contains type descriptions like this plus definitions for the queries and updates (called “Mutations”) supported by the server. Our queries and mutations are:

type Query {
    # look up a user by name
    userByIdentity(service: String!, identity: String!): User

    # look up a feature by id
    feature(id: String!): Feature

    # find out the current log level
    logLevel(): LogLevel

    ...
}

type Mutation {
    # update a user; the response may contain user-visible errors
    # if, for example, an attempt is made to update a user name to
    # an existing name
    updateUserByIdentity(service: String!, identity: String!, newData: UserUpdateInput!): UserChangeResponse

    # add a new feature
    addFeature(feature: FeatureInput!): FeatureChangeResponse

    # delete a feature
    deleteFeature(feature: String!): FeatureDeleteResponse

    # set the current log level
    setLogLevel(level: LogLevel!): LogLevel

    ...
}

To me, this looks a lot like some sort of RPC definition. The special sauce that makes GraphQL powerful, however, is the client’s ability to select which data she wants returned.

Let’s go back to that first query:

userByIdentity(service: String!, identity: String!): User

Here is a use of it that requests the value of the “terms” field:

query {
   userByIdentity(service: "github", identity: "plissken") {
      terms
   }
}

This is passed to a GraphQL server via an HTTP POST. All operations for the API go through the same HTTP end point. In curl, this would look like:

curl -XPOST -H 'Content-Type: application/graphql' -d '{"query": "{userByIdentity(service:\"github\", identity:\"plissken\") {terms}}"}' http://localhost:8128/query
{"data":{"userByIdentity":{"terms":true}}}

The returned payload is:

{
    "userByIdentity": {
        "terms": true
    }
}

Notice that we received the value for the “terms” field. What if we ask for more?

query {
   userByIdentity(service: "github", identity: "plissken") {
      terms ownedOrgs {name maxWriters}
   }
}

Now we get:

{
    "userByIdentity": {
        "ownedOrgs": [
            {
                "maxWriters": 1,
                "name": "plissken"
            }
        ],
        "terms": true
    }
}

So we can specify the fields we want for our types and even the fields for referenced objects recursively. Here we said we wanted the “name” and “maxWriters” field for each org owned by plissken. As one final example, let’s ask for the admins for each owned org:

query {
   userByIdentity(service: "github", identity: "plissken") {
      terms ownedOrgs {name maxWriters admins {identity}}
   }
}

We get:

{
    "userByIdentity": {
        "ownedOrgs": [
            {
                "admins": [
                    {
                        "identity": "plissken"
                    }
                ],
                "maxWriters": 1,
                "name": "plissken"
            }
        ],
        "terms": true
    }
}

Clearly, this is very powerful. Once you establish a schema, a GraphQL client can select the information required for a particular task as needed. Also, if we decide later that we need more data for a particular type, we can add it and all existing calls will remain unaffected.

Mutations

Mutations are very much like RPC calls except that they allow specification of the data to be returned, just like queries. If we set out to update our user, we can use our updateUserByIdentity mutation:

updateUserByIdentity(
    service: String!,
    identity: String!,
    newData: UserUpdateInput!): UserChangeResponse

This says that we can update a user by specifying her service and identity values and an instance of UserUpdateInput for the new data. The update returns a UserChangeResponse which has fields for the user and any errors that were generated. Just like in a query, the returned user data can be expanded by specifying fields of interest. Here’s an example:

mutation {
    updateUserByIdentity(
        service: "github",
        identity: "plissken",
        newData: {identity: "snake"}) {
            user {id identity}
            errors
        }
    }
}

This call returns:

{
    "updateUserByIdentity": {
        "errors": [],
        "user": {
            "id": "7aa512a6-0e85-11e7-8a21-67cbd37f90e2",
            "identity": "snake"
        }
    }
}

PostgreSQL

We picked up substantial benefits right away for this kind of highly connected model just by switching to a relational database. Supporting navigation from multiple directions and sets of users for organizations with different properties (e.g. admins, writers) would require redundancies in a document model and would make it impossible to perform updates transactionally. Additionally, the integrity constraints provided by an RDBMS remove an entire class of errors we would otherwise need to handle. One simple example is if a client tries to add a user to an organization but presents an invalid user id. This is caught for us by a constraint that the user field in the table connecting orgs and users reference an entry in the users table. Done! None of this is probably news to anyone reading this, but I feel like some of the benefits of an RDBMS are often forgotten today with the popularity of NoSQL databases.

In any case, we were able to easily map our GraphQL schema onto a relational model. We’ll see where PostgreSQL really shines when we talk about optimizations in Part 2.

Docker

I believe it’s valuable to discuss Docker in the context of future proofing services because we gained great leverage from our user of Docker. Changing out underlying database technology could have easily caused a significant impact to development for the following reasons:

  • Our UI developers were working concurrently on unrelated features.
  • We didn’t want them to wait until our transition was complete to get started.
  • Deploying a database and tying it together with related services can be complicated and time-consuming.
  • Swapping out environments to work on different tasks often produces a lot of development friction.

We got around these issues through judicious use of Docker. Here at Yipee.io, we model our own application using our modeler and run it using Docker. This means that our UI developers could simply use docker-compose to bring up or shut down our entire set of services (including the database). They could bring up one set of services to do unrelated development and switch to the new database version in about 30 seconds. Also, as we added features or fixed bugs, we could push a new version of our team functionality as a Docker image and they could incorporate it with little to no effort. They, in turn, could generate a new UI image to hand off to our UX team for review. Such short cycle times radically change the time and resources required to react to user feedback and new requirements. For more on how we leverage Docker day to day, see this post.

Part 2

In part 2, we’ll discuss how GraphQL servers are built and how PostgreSQL offers significant leverage for future proofing their construction.


Interested in learning more about Yipee.io? Sign up for free to see how Yipee.io can help your team streamline their development process.