Monday, April 21, 2014

UPSERTisms in Postgres

As I'm sure a lot of people know already, PostgreSQL 9.4 will eventually be released without "native" support for UPSERT.  That said, it's still one of the most common questions on the excellent #postgresql IRC channel, and people are sometimes tempted to try and solve it via TRIGGERs (hell, even RULEs), unsuccessfully.

Even though people are often correctly told to "use a function" and are pointed to the example in the documentation, I often find myself needing some more information about the specific case to give a satisfactory answer.  The tricky part is that due to a race condition, a UNIQUE constraint is necessary to correctly implement UPSERTesque operations in Postgres, but unique violations abort the entire transaction unless a subtransaction is created.  Unfortunately, entering a subtransaction is often said to be expensive, so the trick is to try and identify cases where you can still implement the operation correctly without entering a subtransaction unless absolutely necessary.

So I got curious about the cases where you can get substantial performance benefits from not entering one, and benchmarked different concurrency-safe solutions to three of the common operations I've identified.  The three operations are:

  • INSERT or IGNORE:  INSERT, but if the row already exists, don't do anything.  No result required other than "success".
  • INSERT or SELECT:  INSERT .. RETURNING, but if the row already exists, return data from the existing row as the result instead.
  • INSERT or UPDATE:  INSERT .. RETURNING, but if the row already exists, UPDATE .. RETURNING instead.
For the purposes of this post, I'm going to be assuming that no DELETEs or UPDATEs of the PRIMARY KEY are ever executed on the target table.  Some of the examples would work correctly in these cases at well, but most of them would need small adjustments.

For testing, I pre-populated a table with N % of rows in a range, and then called the UPSERTy function once for every value in that range.  The idea was to get a rough approximation of the performance in "the row exists in N % of operations" for different values of N.  You can find all the code I used here.  Have eye bleach (or a fork) handy if you plan on looking at the bash scripts, though.

INSERT or IGNORE

For INSERT or IGNORE, I came up with three different solutions.  The first variant first sees if the row exists, and if it doesn't it tries the INSERT in a subtransaction.  This approach avoids having to enter a subtransaction if the value clearly already exists.  The hypothesis was that this approach would be fastest if the value exists at least on every second attempt.

The second approach simply always enters a subtransaction and attempts to INSERT the record, ignoring the unique_violation exception if the row already exists. This is probably the fastest approach if most of the time the row does not already exist.

Finally, the last variant does an INSERT .. WHERE NOT EXISTS in a subtransaction.  I expected this to be slightly faster than the previous approach in some cases, but slower in almost all cases.

Here are the results:


As you can see, at ~40% of rows existing avoiding the subtransaction is starting to become a performance gain.  Perhaps slightly surprisingly, INSERT .. WHERE NOT EXISTS in a subtransaction doesn't seem to ever make sense.

INSERT or SELECT

To implement INSERT or SELECT, I came up with a total of four solutions.  The first three are the equivalents of INSERT or IGNORE, as described above, and the last one uses a LOOP instead of copying the lookup directly into the exception handler block.  I had no hypothesis on which one would be faster between the first one and the last one; I was just curious to see if there was any difference at all.  The fourth version also supports DELETEs, unlike any of the other versions.

Results:


The ideas here look very similar to the INSERT or IGNORE case, though skipping the subtransaction makes sense even earlier, at around 30% of rows pre-existing in the target table.  No clear difference between methods one and four, not surprisingly.

INSERT or UPDATE

For the "traditional" UPSERT I just implemented the same approaches as for INSERT or SELECT.  Again, the fourth approach supports DELETEs.

Results:


Very similar results here as well, though methods one and four don't get that big of a benefit anymore from the row existing compared to how fast they are against an empty table.  Subtransactions are still expensive, though.

Conclusion

According to these results, you might want to consider avoiding the subtransaction if you expect the target row to exist roughly 30% of the time.  Definitely consider that approach if you expect the row to exist in anywhere above 40% of cases.

Please remember that these results can only provide a general guideline.  In particular, the behaviour was not tested under any actual concurrency (other than manual tests to verify correctness).  If performance is very important for you, it's always up to you to measure different solutions and choose the one that works best for your application!

Sunday, October 28, 2012

As seen at PGConf.EU: call_graph

At the PostgreSQL Conference Europe 2012, Joel Jacobson was kind enough to present a project we've been working on: call_graph.  For those of you who weren't at the conference, call_graph is a PostgreSQL extension which tracks the stored procedure call stack during execution, gathering that data into a table.  That data can be analyzed or visualized in different ways; my attempt at visualizing it can be found under the utils/ subdirectory.  My apologies to anyone who's good with Perl.  I tried.

Unlike static analyzers, call_graph only looks at code paths that are actually executed, which results in both more accurate and more simple data.  Additionally, because there's no necessity to understand the actual code being executed, it works equally well regardless of the language you've chosen for your stored procedures.

The extension currently only works in PostgreSQL 9.1, but supporting 9.2 should be relatively simple.  If you want to use the utility scripts I've created to visualize the data, you will need Perl and GraphViz.

I can't recommend running the code in production, but if you do, let me know!  In any case, feel free to contact me with feature requests, bug reports (I think there's a way to report them on github, making them publicly available data) or anything else you can think of.

Sunday, February 20, 2011

Faster count(*), part 2

Previously, I introduced two ways of maintaining a materialized view to quickly return the number of rows in a table. The second solution, which is far superior when the table receives a lot of concurrent transactions, requires periodical cleanup. While writing a cron job is not very hard, it's one more thing that's not inside the database.

But wouldn't it be possible to do the cleanup inside the trigger function? There are two problems: 1) when should we run the cleanup? and 2) long-running transactions.

The second one might not be obvious; the problem is that if the trigger decides to run the cleanup procedure inside a transaction that goes on for a long time after that has to keep the DELETEd rows locked. If then another transaction decides to run the cleanup, it has to wait for the long-running transaction to finish. One might argue that a long-running transaction should not run the cleanup, but it would be nice if we could do better. A bit more about that later. Meanwhile, a few ways come into mind for the first problem:

  • Random: run the cleanup N % of the time. This approach can make the second problem even worse.
  • Time: run the cleanup if at least N seconds have elapsed since the last cleanup. Determining a good timeout can be hard.
  • Row count: run the cleanup if at least N rows have been added into counttbl since the last cleanup.

While you could use any combination of these three (or come up with new ones I've neglected), I personally like the last one the most for the discussed use case since it's consistent and fast if you implement it using sequences.

But what about the second problem? PostgreSQL 9.1 will have a nice solution built in: pg_try_advisory_xact_lock(). In previous versions you can LOCK a dummy table in NOWAIT mode, but unfortunately you have to use an expensive EXCEPTION block because LOCK TABLE throws an error if the table can't be locked immediately.

There are some additional challenges in maintaining more complex materialized views, and I am planning on following up on this topic a bit later.

Sunday, December 19, 2010

Queues in SQL

This is actually a post I should've written a long time ago, but never got to it. The problem this post is discussing is implementing a queue in SQL. There are a lot of problems and some of them aren't immediately obvious, so it's not surprising that this is a frequently asked question on the IRC channel. Let's use the following schema:

CREATE TABLE items
(
    id serial PRIMARY KEY,
    available boolean DEFAULT TRUE NOT NULL,

    -- application-specific data here
);

-- fast lookup of available items
CREATE INDEX items_lookup_index ON
    items (id) WHERE available;

And you have one or more "consumer" processes grabbing items from the queue, processing them and then deleting them. When a consumer starts processing an item, it marks it "not available". After the consumer is done, it removes the row from the table. You also have zero or more "producer" processes inserting items into the queue.



Inserting items is quite straightforward, but how would you pop an item off the queue? There is no UPDATE .. LIMIT in PostgreSQL (yet), so that won't work. Instead, you might be tempted to do something like this:

UPDATE
    items
SET available = FALSE
WHERE id =
    (
        SELECT
            min(id)
        FROM items
        WHERE available
    )
RETURNING *
;

But there is a small problem with correctness: two concurrently running consumers might both get the same row. The reason for this is that the subquery is evaluated using a snapshot taken when the query started. If the "availability" of the row changes after that (i.e. another concurrently running process grabs the item), we still UPDATE it and both clients think they got the item. Fortunately, there's a way around this; add the check to the WHERE clause of the UPDATE:

UPDATE
    items
SET available = FALSE
WHERE id =
    (
        SELECT
            min(id)
        FROM items
        WHERE available
    )

-- important
AND available

RETURNING *
;

This works, because the WHERE clause in an UPDATE is always evaluated against the latest version of the row (assuming we're dealing with a table and not a view). However, the subquery always returns exactly one row, and if that row does not match our criteria when we get to the UPDATE, we might end up updating no rows! So either the application needs to be prepared to redo the UPDATE or (what I think is a lot better) this code has to be in a server-side function which retries it automatically.


Now that correctness is not an issue and you have deployed the code to your production server, you might notice that your clients spend a lot of time just popping items off the queue. Because all consumers try to lock the same row, they end up waiting for the "fastest" transaction to finish just to notice that the row was already processed. So effectively, only one consumer transaction can be running at a time, and the remaining need to do a lot of unnecessary work. In the absolute worst case scenario your consumers process the items very quickly and almost immediately come back to the loop with other processes. However, if you only have a small number of consumers and/or processing an item takes a significant amount of time, you might not experience any problems with this approach. But if you do, there is a way to make the queue faster: advisory locks.

What we need is a way to say "if this item is not immediately available, try the next one". There is no way to do that using only SQL, but fortunately we can mix function calls into our SQL. Viz:

UPDATE
    items
SET available = FALSE
WHERE id =
    (
        SELECT
            min(id)
        FROM items
        WHERE
            available AND
            pg_try_advisory_lock(id)
    )
RETURNING *
;

pg_try_advisory_lock() tries to lock an application-defined resource identified by a bigint and returns TRUE if the object was successfully locked, or FALSE if it could not be locked because another session is already holding a lock on that resource. Postgres does not use advisory locks internally, and it is up to the application to define the meaning of the bigint parameter. In this case, an id works well. By doing this, we tell PG to skip the row if it is not available immediately, which is exactly what we wanted. Great! Well, not exactly. There are still three problems with this:

  1. The snapshot problem mentioned earlier is still there.
  2. The planner might prefer a sequential scan over an index scan for the subquery.
  3. Advisory locks are not released automatically (not even when the transaction ends), so we need to manually release them.

The first one is easy to fix: again, add the check to the WHERE clause of the UPDATE. To fix the second one, we need to do a bit more work. The usual solution looks as follows (just the subquery part):

SELECT
    id
FROM
(
    SELECT
        id
    FROM items
    WHERE available
    ORDER BY id
    LIMIT $n
) ss
WHERE
    pg_try_advisory_lock(id)
ORDER BY id
LIMIT 1
;

where $n is the maximum number of consumer processes you expect to be running at any time. What happens here is the subquery tells the planner exactly how many rows we end up processing in the worst case, and then on the outer level we only take the rows we need (and can get). This way the planner doesn't have to try to guess for which rows pg_try_advisory_lock() returns TRUE and it can usually choose the best plan. Here's our final query:

UPDATE
    items
SET available = FALSE
WHERE id =
    (
        SELECT
            id
        FROM
        (
            -- try to convince the planner to choose an
            -- index scan over a seq scan
            SELECT
                id
            FROM items
            WHERE available
            ORDER BY id
            LIMIT $n
        ) ss
        WHERE
            pg_try_advisory_lock(id)
        ORDER BY id
        LIMIT 1
    )

-- important
AND available

RETURNING *
;

Unfortunately, even if we use advisory locks, it is possible for this query to return 0 rows. The later we release the advisory lock, the smaller the probability of that happening becomes. In practice, if you release the advisory lock after COMMIT, the chance of that happening is very close to 0, but releasing advisory locks after COMMIT puts the burden to the client because you can't force a COMMIT in a server-side function (anyone feel like implementing stored procedures? :-) ). Even if you release the lock before committing, you should be fine as long as you have the retry logic in place. Test it out and see how it works!

Keep in mind that this is not a complete solution. What if a consumer dies while processing an item? What if the server crashes? The best way to deal with this is very application specific, so I decided not to cover it, at least not in this post.

You might also have multiple queues in a single database, so using the bigint version of advisory locks might result in problems. In that case, you can sometimes use the integer version of the function like this: pg_try_advisory_lock('tablename'::regclass::int, id). For information about regclass, see the docs.

Friday, November 5, 2010

PGDay

I will be talking at PGDay.eu in December about concurrency-related problems in PostgreSQL (and hopefully a few solutions too!). If your database has more than one user, this is the talk for you!

I'll start from the basics, but you should be familiar with the basics of postgres (or some other SQL database).

Saturday, July 17, 2010

Smallest available ID

Today an interesting problem came up on the IRC channel: Given a table of integer IDs, find the first non-existing ID starting from zero. A straightforward solution using generate_series() and a left join is in the comments, but we're trying to do better than that.

For this solution, there are four cases we need to worry about (and I only thought of one of these on my own, I'll blame the lack of caffeine):

  • There are no rows in the table.
  • There are rows in the table, but their IDs are all greater than zero.
  • There are rows between zero and N, but there are gaps.
  • There are rows between zero and N and no gaps.
The first two can be covered by returning 0 when it's available. This will either create a gap starting from 0, make a gap that already existed in the table the gap that has the smallest ID, or make the table gapless. Finding the gap with the smallest ID is relatively trivial with window functions (8.4 or later only, sorry) and in a gapless table, we can use max + 1. Let's take a look at the actual query:

-- Use zero if available
(SELECT
    0 AS id
 WHERE
    NOT EXISTS
        (SELECT 1 FROM foo WHERE id = 0) )

    UNION ALL

-- Find the smallest available ID inside a gap
(SELECT
    id + 1
 FROM
 (
    SELECT
        id, lead(id) OVER (ORDER BY id)
    FROM
        foo
 ) ss
 WHERE
    lead - id > 1
 ORDER BY
    id
 LIMIT
    1)

    UNION ALL

-- Or if there were no gaps and zero wasn't
-- available, use max + 1.  But don't return
-- a row if the table is empty; the first
-- query took care of that.
(SELECT
    max(id) + 1
 FROM
    foo
 HAVING
     max(id) IS NOT NULL)

ORDER BY
    id
LIMIT
    1
;

The last ORDER BY id LIMIT 1 affects the result of the whole UNION so we always get the smallest available ID if more than one of these queries return a row. For that same reason we also need to use parentheses around the second query; we only want to return one ID if there are several gaps or the only gap is wide. I used parentheses for all three queries because I thought having them only around one query in a UNION looked funny.

For the record, finding the gap on a version older than 8.4 is still possible, but I'm not going to cover it here. Consider upgrading ;-)



Edit: It occurred to me that don't need to separately query for the max value; lead(id) will be NULL for the last row. This gives us a bit nicer query:

-- Use zero if available
(SELECT
    0 AS id
 WHERE
    NOT EXISTS
        (SELECT 1 FROM foo WHERE id = 0) )

    UNION ALL

-- Find the smallest available ID inside a gap, or max + 1
-- if there are no gaps.
(SELECT
    id + 1
 FROM
 (
    SELECT
        id, lead(id) OVER (ORDER BY id)
    FROM
        foo
 ) ss
 WHERE
    lead - id > 1 OR
    lead IS NULL
 ORDER BY
    id
 LIMIT
    1)

ORDER BY
    id
LIMIT
    1
;

Unfortunately, you still need the first query of the UNION, but there's a way around that, too: insert a dummy row with id = -1 to the table. That gives us an even nicer query:

SELECT
    id + 1
FROM
(
    SELECT
        id, lead(id) OVER (ORDER BY id)
    FROM
        foo
) ss
WHERE
    lead - id > 1 OR
    lead IS NULL
ORDER BY
    id
LIMIT
    1
;

You could also add the dummy row in a subquery instead of selecting straight from the table, but then the query can't use an index at all.

Tuesday, June 29, 2010

Faster count(*)

One of the most common questions on IRC is "how to make SELECT count(*) FROM tbl; faster?" While it's not clear why an application would need to run this query repeatedly, it is possible to make it a bit faster. There are two common approaches:

The easiest way is to use a trigger to update a static row count in a separate table:

CREATE TABLE counttbl
    (rowcount int NOT NULL,
     CHECK (rowcount >= 0));

CREATE UNIQUE INDEX "counttbl_single_row"
    ON counttbl((1)); -- only allow one row in counttbl

CREATE FUNCTION update_counttbl() RETURNS trigger AS
$$
BEGIN
    IF TG_OP = 'INSERT' THEN
        UPDATE counttbl SET rowcount = rowcount + 1;
    ELSIF TG_OP = 'DELETE' THEN
        UPDATE counttbl SET rowcount = rowcount - 1;
    END IF;
    RETURN NULL; -- ignored in an AFTER trigger
END;
$$ language plpgsql;

CREATE TRIGGER update_rowcount AFTER INSERT OR DELETE ON realtbl
    FOR EACH ROW EXECUTE PROCEDURE update_counttbl();

INSERT INTO counttbl SELECT count(*)
    FROM realtbl; -- populate the count table

This approach is fairly straightforward, but suffers from a huge problem: only one transaction can INSERT INTO or DELETE FROM the table at a time because UPDATE on counttbl locks the row exclusively (the lock is necessary though). With some additional trickery, we can avoid this and achieve better concurrency:

CREATE TABLE counttbl
    (rowcount int NOT NULL,
     CHECK (rowcount >= -1));

CREATE UNIQUE INDEX "counttbl_single_row" ON counttbl((1))
    WHERE rowcount > 1; -- only allow one "sum" row

CREATE FUNCTION update_counttbl() RETURNS trigger AS
$$
BEGIN
    IF TG_OP = 'INSERT' THEN
        INSERT INTO counttbl VALUES(1);
    ELSIF TG_OP = 'DELETE' THEN
        INSERT INTO counttbl VALUES(-1);
    END IF;
    RETURN NULL; -- ignored in an AFTER trigger
END;
$$ language plpgsql;

CREATE TRIGGER update_rowcount AFTER INSERT OR DELETE ON realtbl
    FOR EACH ROW EXECUTE PROCEDURE update_counttbl();

INSERT INTO counttbl SELECT count(*)
    FROM realtbl; -- populate the count table

Now we can run concurrent INSERTs and DELETEs on realtbl because we don't need to UPDATE any rows. To actually find out the count you'd run SELECT sum(rowcount) FROM counttbl; This approach has two problems though: it's a bit slower to find out the count than in the first solution, and it requires you to periodically update the "sum" row.

As always, neither of these solutions work for every possible case. It's up to you to choose the one that works for your application.