johto's lair: July 2010

Today an interesting problem came up on the IRC channel: Given a table of integer IDs, find the first non-existing ID starting from zero. A straightforward solution using generate_series() and a left join is in the comments, but we're trying to do better than that.

For this solution, there are four cases we need to worry about (and I only thought of one of these on my own, I'll blame the lack of caffeine):

There are no rows in the table.
There are rows in the table, but their IDs are all greater than zero.
There are rows between zero and N, but there are gaps.
There are rows between zero and N and no gaps.

The first two can be covered by returning 0 when it's available. This will either create a gap starting from 0, make a gap that already existed in the table the gap that has the smallest ID, or make the table gapless. Finding the gap with the smallest ID is relatively trivial with window functions (8.4 or later only, sorry) and in a gapless table, we can use max + 1. Let's take a look at the actual query:

-- Use zero if available
(SELECT
    0 AS id
 WHERE
    NOT EXISTS
        (SELECT 1 FROM foo WHERE id = 0) )

    UNION ALL

-- Find the smallest available ID inside a gap
(SELECT
    id + 1
 FROM
 (
    SELECT
        id, lead(id) OVER (ORDER BY id)
    FROM
        foo
 ) ss
 WHERE
    lead - id > 1
 ORDER BY
    id
 LIMIT
    1)

    UNION ALL

-- Or if there were no gaps and zero wasn't
-- available, use max + 1.  But don't return
-- a row if the table is empty; the first
-- query took care of that.
(SELECT
    max(id) + 1
 FROM
    foo
 HAVING
     max(id) IS NOT NULL)

ORDER BY
    id
LIMIT
    1
;

The last ORDER BY id LIMIT 1 affects the result of the whole UNION so we always get the smallest available ID if more than one of these queries return a row. For that same reason we also need to use parentheses around the second query; we only want to return one ID if there are several gaps or the only gap is wide. I used parentheses for all three queries because I thought having them only around one query in a UNION looked funny.

For the record, finding the gap on a version older than 8.4 is still possible, but I'm not going to cover it here. Consider upgrading ;-)

Edit: It occurred to me that don't need to separately query for the max value; lead(id) will be NULL for the last row. This gives us a bit nicer query:

-- Use zero if available
(SELECT
    0 AS id
 WHERE
    NOT EXISTS
        (SELECT 1 FROM foo WHERE id = 0) )

    UNION ALL

-- Find the smallest available ID inside a gap, or max + 1
-- if there are no gaps.
(SELECT
    id + 1
 FROM
 (
    SELECT
        id, lead(id) OVER (ORDER BY id)
    FROM
        foo
 ) ss
 WHERE
    lead - id > 1 OR
    lead IS NULL
 ORDER BY
    id
 LIMIT
    1)

ORDER BY
    id
LIMIT
    1
;

Unfortunately, you still need the first query of the UNION, but there's a way around that, too: insert a dummy row with id = -1 to the table. That gives us an even nicer query:

SELECT
    id + 1
FROM
(
    SELECT
        id, lead(id) OVER (ORDER BY id)
    FROM
        foo
) ss
WHERE
    lead - id > 1 OR
    lead IS NULL
ORDER BY
    id
LIMIT
    1
;

You could also add the dummy row in a subquery instead of selecting straight from the table, but then the query can't use an index at all.

johto's lair

Saturday, July 17, 2010

Smallest available ID