Sunday, February 20, 2011

Faster count(*), part 2

Previously, I introduced two ways of maintaining a materialized view to quickly return the number of rows in a table. The second solution, which is far superior when the table receives a lot of concurrent transactions, requires periodical cleanup. While writing a cron job is not very hard, it's one more thing that's not inside the database.

But wouldn't it be possible to do the cleanup inside the trigger function? There are two problems: 1) when should we run the cleanup? and 2) long-running transactions.

The second one might not be obvious; the problem is that if the trigger decides to run the cleanup procedure inside a transaction that goes on for a long time after that has to keep the DELETEd rows locked. If then another transaction decides to run the cleanup, it has to wait for the long-running transaction to finish. One might argue that a long-running transaction should not run the cleanup, but it would be nice if we could do better. A bit more about that later. Meanwhile, a few ways come into mind for the first problem:

  • Random: run the cleanup N % of the time. This approach can make the second problem even worse.
  • Time: run the cleanup if at least N seconds have elapsed since the last cleanup. Determining a good timeout can be hard.
  • Row count: run the cleanup if at least N rows have been added into counttbl since the last cleanup.

While you could use any combination of these three (or come up with new ones I've neglected), I personally like the last one the most for the discussed use case since it's consistent and fast if you implement it using sequences.

But what about the second problem? PostgreSQL 9.1 will have a nice solution built in: pg_try_advisory_xact_lock(). In previous versions you can LOCK a dummy table in NOWAIT mode, but unfortunately you have to use an expensive EXCEPTION block because LOCK TABLE throws an error if the table can't be locked immediately.

There are some additional challenges in maintaining more complex materialized views, and I am planning on following up on this topic a bit later.


  1. What about running the cleanup in a DEFERRED CONSTRAINT TRIGGER that runs at commit time?

  2. @Anonymous:

    That's a neat idea, but as a general solution I'd feel a bit uncomfortable putting the trigger on either of the tables. Using a dummy table just for queuing the trigger might work though. I think I still need to think about this some more..

  3. Thats great idea. Sometimes, I confused in applying faster count(*) part but you clarified my doubts at large extent. thanks a lot.