Bulk inserting nested data into the database (part II)

Gajus Kuizinas

Fullstack Engineer
Node.js
PostgreSQL

Back when I originally published Slonik PostgreSQL client for Node.js, I was working with large datasets, inserting tens of millions of records per hour. However, the shape of the data being inserted was usually pretty flat and therefore pretty easy to insert using INSERT INTO ... SELECT * FROM unnset() pattern. At the time, I recommended the unnest pattern as the go-to pattern for inserting large number of rows in batches (that was part I).

However, today I've found a better way: jsonb_to_recordset.

jsonb_to_recordset expands the top-level JSON array of objects to a set of rows having the composite type defined by an AS clause.

Put it simply, jsonb_to_recordset allows to convert arbitrary JSON into a recordset which we can use to query and insert records from, just like with unnest. However, unlike unnest, since we are simply passing a JSON to PostgreSQL, the resulting format is a lot more expressive and powerful.

SELECT *
FROM json_to_recordset('[{"name":"John","tags":["foo","bar"]},{"name":"Jane","tags":["baz"]}]')
AS t1(name text, tags text[]);
 name |   tags
------+-----------
 John | {foo,bar}
 Jane | {baz}
(2 rows)

Let's demonstrate how you would use it to insert data.

Inserting data using json_to_recordset

Let's say you need to insert a list of persons into the database, and each person has an array of attributes associated with them.

const persons = [
  {
    name: 'John',
    tags: ['foo', 'bar']
  },
  {
    name: 'Jane',
    tags: ['baz']
  }
];

You might be tempted to simply iterate through the array and fire a separate INSERT query for each record, e.g.

for (const person of persons) {
  await pool.query(sql`
    INSERT INTO person (name, tags)
    VALUES (
      ${person.name},
      ${sql.array(person.tags, 'text[]')}
    )
  `);
}

When dealing with a few records, this is even preferable, because it is easy to read and understand. However, if you are like me and you find yourself debugging an INSERT query that is called 2M+ times per day, then chances are that batching inserts is preferable.

What prompted the search for better alternatives.

You could also attempt to insert using unnest pattern and it would look something like this:

await pool.query(sql`
  INSERT INTO public.person (name, tags)
  SELECT t1.name, t1.tags::text[]
  FROM unnest(
    ${sql.array(['John', 'Jane'], 'text')},
    ${sql.array(['{foo,bar}', '{baz}'], 'text')}
  ) AS t1.(name, tags);
`);

The problem here is that you are effective responsible for converting arrays into PostgreSQL array string representation and passing them as text parameters, which is ugly. You also need to iterate the array to construct slices that represent values for each column, which is also quite ugly.

However, thanks to jsonb_to_recordset, we can achieve the same result with:

await pool.query(sql`
  INSERT INTO person (name, tags)
  SELECT *
  FROM jsonb_to_recordset(${sql.jsonb(persons)}) AS t(name text, tags text[])
`);

In contrast to the unnest approach, using jsonb_to_recordset we can easily insert complex nested data structures, and we can pass the original JSON document to the query without needing to manipulate it.

In terms of performance they are also exactly the same. As such, my current recommendation is to prefer jsonb_to_recordset whenever inserting lots of rows or nested data structures.

2022

Partner With Gajus
View Services

More Projects by Gajus