Thursday, November 6, 2014

PostgreSQL gotcha of the week, week 45

Something happened earlier this week (or maybe it was last week, I forget), and I was quite perplexed for a good 30 seconds, so I thought I'd try and make this problem a bit more well-known.  Consider the following schema:

CREATE TABLE blacklist (
  person text
);

INSERT INTO blacklist VALUES ('badguy');

CREATE TABLE orders (
  orderid serial PRIMARY KEY,
  person text,
  amount numeric
);

CREATE FUNCTION is_blacklisted(_person text)
  RETURNS boolean
  STABLE
AS $$
BEGIN
  RETURN EXISTS (SELECT 1 FROM blacklist
                 WHERE person = _person);
END
$$ LANGUAGE plpgsql;

CREATE FUNCTION new_order(_person text, _amount numeric)
  RETURNS bigint
AS $$
DECLARE
_orderid bigint;
BEGIN
  IF is_blacklisted(_person) THEN
    RAISE EXCEPTION 'person has been blacklisted';
  END IF;

  INSERT INTO orders (person, amount)
    VALUES (_person, _amount)
  RETURNING orderid INTO _orderid;
  RETURN _orderid;
END
$$ LANGUAGE plpgsql;

Now everything is running smoothly:

=# select new_order('goodguy', 100.0);
 new_order 
-----------
         1
(1 row)

=# select new_order('badguy', 100.0);
ERROR:  person has been blacklisted

But something's wrong with the blacklist table: it doesn't tell you who blacklisted the person or why.  You also happen to know that you added all of the existing people on the blacklist, so you want to have them reflect that.  You run the following ALTER TABLEs:

ALTER TABLE blacklist
  ADD COLUMN reason text,
  -- use DEFAULT to assign values to existing rows
  ADD COLUMN blacklisted_by text
    NOT NULL DEFAULT 'you';

-- and then drop it
ALTER TABLE blacklist
  ALTER COLUMN blacklisted_by
    DROP DEFAULT;

And everything seems to be working nicely.  However, a few days later you realize that "badguy" stole all your money by getting through a couple of fake orders.  How did that happen?

The answer is a bit complicated, but the gist of it is that if a STABLE function accesses the table for the first time in a transaction while a table-rewriting operation (such as the ALTER TABLE we used) is going on concurrently, it will see an empty table, instead of either the old or the new version of the data.  is_blacklisted() returned FALSE because of this, and "badguy" happened to be able to create an order.  This caveat applies to ALL SERIALIZATION MODES; not even READ COMMITTED users are safe.  However, SQL language functions don't seem to have this problem in more recent versions of postgres.

This really questions the safety of table-rewriting operations in an application heavy on PL/PgSQL (or some other procedural languages).  I know we're banning them internally.