GraphQL with Java and Kotlin - Using BatchLoaders
Addressing the N+1 Query problem with GraphQL in Java and Kotlin using BatchLoaders
GraphQL as it states on the official website is a query language for your API. One of the best aspects of GraphQL is that rather than defining endpoints you expose a graph of types that clients can interact with - freeing you from knowing up front how clients may want to request data.
The concept of exposing a graph via a schema can be liberating for your data but this super power is not without its costs. This new convenience also comes with deciding and implementing data fetching logic to support that graph. One of the more subtle challenges when working with GraphQL is ensuring your data fetching logic is efficient.
A fairly well-known performance problem when getting started with GraphQL is called the N+1 query problem. Let’s take a look at how we can solve this using a concept called data-loaders. This concept was developed to help batch and cache data within the context of a given GraphQL request.
An example N+1 problem with DataFetchers
Let’s start with a schema that contains a User and Project. We will also add a Query type that allows for grabbing all users.
type User {
id: ID
email: String
name: String
projects: [Project]
}
type Project {
id: ID
name: String
}
type Query {
users: [User]
}
Now that our schema is defined we will need to create a few DataFetcher objects to grab the data. Those might look something like the following if you are using the graphql-java library with Kotlin.
//Will be triggered when sending the `users` query
private val userFetcher = DataFetcher<List<User>> { env ->
return listOf(User(1, "1@test.com", "One"),
User(2, "2@test.com", "Two"))
}
//Will be triggered to load the `projects` field on each User
private val projectFetcher = DataFetcher<List<Project>> { env ->
val ctx = env.getContext()
return getProjectsForUser(ctx.userId)
}
private fun getProjectsForUser(userId: String): List<Project> {
//TODO: Fetch the projects for a given user
}
With this in place we deploy our API and we start seeing clients execute the following query.
query users {
id
email
name
project {
id
name
}
}
When GraphQL executes this query it will first grab all users from the UserFetcher. It will then iterate over each user and load the Project objects individually for each User. There are two main issues with this query path.
- We will be making a query for each user passed in. This behavior is known as the N+1 query problem and it would be far more efficient to load all of the projects at once or even in batches.
- It is possible that many of the User objects will have the same Project and we should be caching and reusing those within the context of the request.
Improving performance with DataLoaders
In an ideal scenario we would fetch all of the Project objects we need in a batch. This is where the concept of DataLoaders come in to play.
Using the graphql-java library we can define a BatchLoader that will cache and return a list of Project objects within the context of a GraphQL request. This BatchLoader will intelligently manage querying and returning those objects within the scope of our request. The BatchLoader for Projects might look something like the following.
class BatchProjectLoader : BatchLoader<String, Project> {
override fun load(keys: MutableList<String>): CompletionStage<MutableList<Project>> {
return CompletableFuture.supplyAsync {
return getAllProjects(keys)
}
}
}
}
private fun getAllProjects(projectIds: List<String>): List<Project> {
//Fetch a list of projects by their ID's
}
Now that we have our BatchLoader we need to update our DataFetcher that retrieves Project objects to use our BatchLoader instead. The updated fetcher will look something like the following.
private val projectFetcher = DataFetcher<List<Project>> { env ->
val ctx = env.getContext()
//Grab the BatchLoader from the request context
val dataLoader = env.getDataLoader<String, Project>("projects")
//You can pass in whatever you would like as a key to get projects
//just make sure your BatchLoader takes this into account
return dataLoader.load(ctx.user.projectIds)
}
With this in place GraphQL will first fetch the User objects. It will then queue up our requests to load Project objects and the BatchLoader will fire at sometime in the future to retrieve multiple Project objects at a time. Our projectFetcher will then be called for each User object to get the Project objects from the current request context.
This gives us a request scoped cache of objects in our response graph and addresses the N+1 query problem by batching the N queries together. In doing this we have both improved our performance and the efficiency of our data fetching.