Tuesday, June 29, 2010

Writeable CTEs

For the past year or so, I've worked on a feature known as "Writeable CTEs". While some people know what this feature is about and would want to use it right now (I'm talking about you, Merlin), there's still a lot of people who don't know what they offer. So far, I've identified two major use cases, but I'm quite sure people will find more as they wrap their heads around the feature. ;-) Let's take a look at what I've got so far:

1. Moving rows from one table to another

On any released PG version, you would usually do this:

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
INSERT INTO target SELECT * FROM source;
DELETE FROM source;
COMMIT;

There are some drawbacks to this approach, though:

  1. If you have multiple processes running this same transaction at the same time, you get a lot of serialization errors.
  2. If the source table gets a lot of UPDATEs and DELETEs, you get a lot of serialization errors.
  3. You need to do two accesses to the heap.
  4. You need to do it in SERIALIZABLE isolation mode.

While you can avoid #1 quite easily by using a lock for these processes, effectively allowing only one of them to do this at a time, #2 is a lot harder to avoid. #3 shouldn't be significant in practice but can be in some scenarios. #4 can actually be problematic.

With the new feature, you can avoid all four drawbacks. The syntax is* also a lot more intuitive:

WITH t AS
    (DELETE FROM source RETURNING *)
INSERT INTO target SELECT * FROM t;

This will do exactly what it looks like: first delete the rows from "source" and then insert them into "target". While the first version of this feature (which I suspect we'll see in 9.1) will need to materialize the complete DELETE result set first, I'm hopeful that we can remove that need in the future.

2. INSERTing into multiple tables

Imagine you're writing a web application which collects information about people and their pets. When the user has typed in his real name and the names of his pets, you want to add him to your (normalized) database. Like any other web app, you're using surrogate keys. Normally, you would first INSERT the user and get the userid with RETURNING or currval() and then INSERT the pets. While this doesn't seem too bad, with a bigger application you might end up doing tens of round-trips to the server. With writeable CTEs, you can do this easily:

WITH person AS
    (INSERT INTO persons VALUES ('Marko Tiikkaja') RETURNING userid)
INSERT INTO pets
SELECT
    person.userid, pet
FROM
    person
CROSS JOIN
    unnest(ARRAY['dog1', 'dog2', 'cat1']) pet

First, my name is added to "persons" table and the "person" CTE holds a single row with my userid. Now we want to add one row with this userid for each pet into the table "pets". This can be done easily with a CROSS JOIN. I could've also used the syntax FROM person, unnest(..) but I wanted to make clear that a cross join was desirable. We could also easily add different persons with different pets by also putting the "INSERT INTO pets .." statements into their own CTEs (you can have about as many CTEs as you need).


* It is not yet clear that the syntax will be exactly this, but I'm going to try to get there. :-)

Edit: s/REPEATABLE READ/SERIALIZABLE/ to avoid confusion.

9 comments:

  1. Now that I know its benefit, I'd like to see what is the hard point to implement this next.

    ReplyDelete
  2. BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;

    Does postgresql honor the repeatable read isolation level?

    ReplyDelete
  3. @Richard Broersma:

    Sure, as long as the SQL standard is concerned.

    I was only using "REPEATABLE READ" here to make a distinction between the SERIALIZABLE mode we have now and true serializability. I probably should've pointed that out or just used SERIALIZABLE. That transaction won't work if phantom reads can happen.

    ReplyDelete
  4. @Hitoshi Harada:

    There were some hard-to-fix problems with the design I worked on for 9.0. Currently, I'm working on a design suggested by Tom at: http://archives.postgresql.org/pgsql-hackers/2009-11/msg01860.php

    I will be posting a patch and the new design on -HACKERS shortly. Feel free to participate :-)

    ReplyDelete
  5. For the first example, couldn't you also do that without CTEs, like this?

    INSERT INTO target SELECT * FROM (DELETE FROM source RETURNING *);

    In the first example, the DELETE-RETURNING was being used in a context that would normally take a SELECT, so could we use a DELETE-RETURNING in other SELECT-expecting contexts, such as in the above example?

    ReplyDelete
  6. @Darren Duncan:

    No. This has been discussed many times before and the only way this is going to work is to allow DML statements only inside top-level CTEs.

    ReplyDelete
  7. thank you for writing query. I am looking for query through which I could transfer some specific row data from one table to another one. Could you please help me ?

    ReplyDelete
  8. I'm sorry for this intrusion, I see from your site that we are very much into the same things.

    ReplyDelete