pg_graphql: A GraphQL extension for PostgreSQL

2021-12-03

ā€¢

7 minute read

šŸ†• pg_graphql is now generally available and has undergone significant enhancements since this announcement. Here is what is new:

Today we're open sourcing pg_graphql, a native PostgreSQL extension adding GraphQL support. The extension keeps schema generation, query parsing, and resolvers all neatly contained on your database server requiring no external services.

pg_graphql inspects an existing PostgreSQL schema and reflects a GraphQL schema with resolvers that are:

  • performant
  • always up-to-date
  • compliant with best practices
  • serverless
  • open source

Interested? You're 3 commands away from a live GraphiQL demo.

Motivation

The Supabase stack is centered around PostgreSQL as the single source of truth. All data, configuration, and security are housed in the database so any GraphQL solution needed to be equivalently SQL-centric.

With that in mind, we took a look at the landscape and considered two excellent technologies, Graphile, and Hasura.

RequirementsGraphileHasura
Open Sourceāœ…āœ…
Reflected GraphQL Schemaāœ…āœ…
Reflected Resolversāœ…āœ…
Always up-to-dateāœ…āœ…
Performantāœ…āœ…

We found both options to be largely viable for the core feature set.

Which left us with one final hang-up: we host free-tier projects on VMs with 1 GB of memory. After tallying the resources reserved for PostgreSQL, PostgREST, Kong, GoTrue, and a handful of smaller services, we were left with a total memory budget of ... 0 MB šŸ˜¬. Unsurprisingly, our pathological memory target disqualified any option that required launching another process in those VMs.

For that reason, we decided to invest in a lightweight alternative that runs in the database, and can be exposed over HTTP using the existing PostgREST deployments' RPC functionality.

By our most conservative estimate, that reduces the platform's memory requirements by 525 TB/hours every month, saving šŸ’° and šŸŒ³.

Design

As a native PostgreSQL extension, pg_graphl is written in a combination of C and SQL. Each GraphQL query is parsed, validated, and transpiled to SQL, all within the database.

Each GraphQL request is resolved by a single SQL statement. That SQL statement aggregates requested data as a JSON document to return to the caller. This approach results in blazing fast response times, avoids the N+1 query problem, and hits the theoretical minimum achievable network IO overhead of any GraphQL to SQL resolver. No special permissions are required for the PostgreSQL role executing queries, so pg_graphql is fully compatible with your existing row level security policies.

Embedding the GraphQL server directly in the database allows us to leverage PostgreSQL's built-in solutions for common challenges:

Caching ā†’ PREPARE STATEMENT

Errors ā†’ RAISE EXCEPTION

Bad Data ā†’ ROLLBACK

Authorization ā†’ CREATE POLICY

Similarly, pg_graphql benefits from PostgreSQL's strong ACID guarantees and can expose them through its API.

Ever wanted to execute multiple operations in a single transaction? Each request is managed in a single transaction so with a multi-operation GraphQL request and pg_graphql, that behavior falls out for free!

Schema Reflection

As a limited example of how the reflection engine works, here's how it converts a single table into a full GraphQL schema.

hideCopy

_10
# schema.sql
_10
create table account (
_10
id serial primary key,
_10
email varchar(255) not null,
_10
created_at timestamp not null,
_10
updated_at timestamp not null
_10
);

Translates into

hideCopy

_37
# schema.graphql
_37
scalar Cursor
_37
scalar DateTime
_37
scalar JSON
_37
scalar UUID
_37
scalar BigInt
_37
_37
type PageInfo {
_37
hasNextPage: Boolean!
_37
hasPreviousPage: Boolean!
_37
startCursor: String!
_37
endCursor: String!
_37
}
_37
_37
type Query {
_37
account(nodeId: ID!): Account
_37
allAccounts(after: Cursor, before: Cursor, first: Int, last: Int): AccountConnection
_37
}
_37
_37
type Account {
_37
nodeId: ID!
_37
id: String!
_37
email: String!
_37
createdAt: DateTime!
_37
updatedAt: DateTime!
_37
}
_37
_37
type AccountEdge {
_37
cursor: String!
_37
node: Account
_37
}
_37
_37
type AccountConnection {
_37
totalCount: Int!
_37
pageInfo: PageInfo!
_37
edges: [AccountEdge]
_37
}

Where Query type's account field selects a single account by its globally unique ID and allAccounts enables pagination via the relay connections specification. Under the SQL hood, iterating through pages is handled using keyset pagination giving consistent retrieval times on every page.

For a more complete examples with relationships, enums, and more exotic types check out the API doc.

API

pg_graphql's public API is a single SQL function that returns JSON.

hideCopy

_10
gql.resolve(
_10
stmt text, -- the graphql query/mutation
_10
variables jsonb default '{}'::jsonb, -- key value pairs
_10
)
_10
returns jsonb

For example, a GraphQL query selecting the id field for a collection of type Book would look like this:

hideCopy

_17
gqldb= select gql.resolve($$
_17
_17
query {
_17
allBooks {
_17
edges {
_17
node {
_17
id
_17
}
_17
}
_17
}
_17
}
_17
_17
$$);
_17
_17
resolve
_17
----------------------------------------------------------------------
_17
{"data": {"allBooks": {"edges": [{"node": {"id": 1}}]}}, "errors": []}

We're opting to expose the function over HTTP through PostgREST but you could also connect to the PostgreSQL database and call the function directly from your server code in any programming language.

Performance

When it comes to APIs, performance counts. Here are some figures from Apache Bench showing 2,205 requests/second on a 4 core machine with 16 GB of memory.

hideCopy

_11
Concurrency Level: 8
_11
Time taken for tests: 3.628 seconds
_11
Complete requests: 8000
_11
Failed requests: 0
_11
Total transferred: 1768000 bytes
_11
Total body sent: 1928000
_11
HTML transferred: 368000 bytes
_11
Requests per second: 2205.21 [#/sec] (mean)
_11
Time per request: 3.628 [ms] (mean)
_11
Time per request: 0.453 [ms] (mean, across all concurrent requests)
_11
Transfer rate: 475.93 [Kbytes/sec] received

Full steps to reproduce this output are available in the docs.

Open Source

pg_graphql is open source software. As always, Issues and PRs are welcome.

Try pg_graphql today to see a live GraphiQL demo.

More pg_graphql

Share this article

Build in a weekend, scale to millions