Skip to content

Commit

Permalink
feat: more pg-typed awesomeness (#217)
Browse files Browse the repository at this point in the history
  • Loading branch information
ForbesLindesay authored Jan 21, 2022
1 parent ab76428 commit 793a37d
Show file tree
Hide file tree
Showing 14 changed files with 1,163 additions and 237 deletions.
98 changes: 0 additions & 98 deletions docs/blog/2022-01-19-release-pg-bulk.md

This file was deleted.

180 changes: 180 additions & 0 deletions docs/blog/2022-01-21-optimizing-postgres-using-unnest.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
---
title: 'Postgres UNNEST cheat sheet for bulk operations'
author: Forbes Lindesay
authorURL: http://twitter.com/ForbesLindesay
authorTwitter: ForbesLindesay
googleSummary: UNNEST is the only way to make Postgres queries fast and reliable, if you want to interact thousands of rows at a time.
ogSummary: UNNEST is the only way to make Postgres queries fast and reliable, if you want to interact thousands of rows at a time.
---

Postgres is normally very fast, but it can become slow (or even fail completely), if you have too many parameters in your queries. When it comes to operating on data in bulk, `UNNEST` is the only way to achieve fast, reliable queries. This post has examples for using `UNNEST` to do all types of bulk transactions.

<!--truncate-->

All the examples in this article assume a database schema that looks like:

```sql
CREATE TABLE users (
email TEXT NOT NULL PRIMARY KEY,
favorite_color TEXT NOT NULL
)
```

## INSERT thousands of records in one go

To insert many records into a Postgres table in one go, the most efficient method is to provide each column as a separate array and then use `UNNEST` to construct the rows to insert.

You can run the following query:

```sql
INSERT INTO users (email, favorite_color)
SELECT
UNNEST(?::TEXT[]),
UNNEST(?::TEXT[])
```

With parameters like:

```json
[
["[email protected]", "[email protected]", "[email protected]"],
["red", "green", "indigo"]
]
```

Notice that you're only passing 2 parameters, no matter how many rows you want to insert. You're also using the same query text no matter how many rows you want to insert. This is what keeps the query so efficient.

The resulting table would look like:

| email | favorite_color |
| ---------------- | -------------- |
| [email protected] | red |
| [email protected] | green |
| [email protected] | indigo |

## UPDATE multiple records to different values in a single query

One of the most powerful use cases of `UNNEST` is to update multiple records in a single query. The normal `UPDATE` statement only really lets you update multiple records in one go if you want to set them all to the same value, but this approach is way more flexible.

You can run the following query:

```sql
UPDATE users
SET
favorite_color=bulk_query.updated_favorite_color
FROM
(
SELECT
UNNEST(?::TEXT[])
AS email,
UNNEST(?::TEXT[])
AS updated_favorite_color
) AS bulk_query
WHERE
users.email=bulk_query.email
```

With parameters like:

```json
[
["[email protected]", "[email protected]", "[email protected]"],
["purple", "violet", "orange"]
]
```

The resulting table would then look like:

| email | favorite_color |
| ---------------- | -------------- |
| [email protected] | purple |
| [email protected] | violet |
| [email protected] | orange |

Not only does this let you update all these records in one statement, but the number of parameters remains fixed at 2 now matter how many rows you want to update.

## SELECT with thousands of different conditions in one go

You can always build a very large query by combining `OR` and `AND`, but eventually, if you have enough parameters, this may start to become slow.

You can run the following query:

```sql
SELECT * FROM users
WHERE (email, favorite_color) IN (
SELECT
UNNEST(?::TEXT[]),
UNNEST(?::TEXT[])
)
```

With parameters like:

```json
[
["[email protected]", "[email protected]", "[email protected]"],
["purple", "violet", "orange"]
]
```

and it will be equivalent to running:

```sql
SELECT * FROM users
WHERE
(email='[email protected]' AND favorite_color='purple')
OR (email='[email protected]' AND favorite_color='violet')
OR (email='[email protected]' AND favorite_color='orange')
```

Using `UNNEST` here lets us keep the query constant, and use only 2 parameters, regardless of how many conditions we want to add.

An alternative if you need more control can be to use an INNER JOIN instead of the `IN` part of the query. For example, if you needed the tests to be case insensitive you could do:

```sql
SELECT users.* FROM users
INNER JOIN (
SELECT
UNNEST(?::TEXT[]) AS email,
UNNEST(?::TEXT[]) AS favorite_color
) AS unnest_query
ON (LOWER(users.email) = LOWER(unnest_query.email) AND LOWER(user.favorite_color) = LOWER(unnest_query.favorite_color))
```

## DELETE with thousands of different conditions in one go

Just like `SELECT`, `DELETE` queries can become slow if the complexity of your conditions grows too extreme.

You can run the following query:

```sql
DELETE FROM users
WHERE (email, favorite_color) IN (
SELECT
UNNEST(?::TEXT[]),
UNNEST(?::TEXT[])
)
```

With parameters like:

```json
[
["[email protected]", "[email protected]", "[email protected]"],
["purple", "violet", "orange"]
]
```

and it will be equivalent to running:

```sql
DELETE FROM users
WHERE
(email='[email protected]' AND favorite_color='purple')
OR (email='[email protected]' AND favorite_color='violet')
OR (email='[email protected]' AND favorite_color='orange')
```

Just like with `SELECT`, using `UNNEST` here lets us keep the query constant, and use only 2 parameters, regardless of how many conditions we want to add.

> If you're using node.js, you can do all these operations without having to memorize the syntax by using [`@database/pg-typed`](https://www.atdatabases.org/docs/pg-guide-typescript) or [`@database/pg-bulk`](https://www.atdatabases.org/docs/pg-bulk).
Loading

0 comments on commit 793a37d

Please sign in to comment.