johto's lair

LISTENing connections aren't cheap

2015-11-18T03:30:00.000+01:00

Recently I used perf to look into what could be the cause for our increased CPU usage on the PostgreSQL server (encouraged by Andres' great talk at pconf.eu). I was somewhat surprised to find that thirty percent of the CPU time used by postgres was spent spinning on spinlocks, i.e. doing no actual useful work. Digging into the profile a bit more, most of these were coming from a

allas: connection pooling for LISTEN / NOTIFY

2015-02-04T21:20:00.002+01:00

Lately I've been working on a connection pooler which only supports LISTEN / NOTIFY. The idea is to be able to keep the number of Postgres connections down without having to give up (or come up with a workaround for) notifications. With allas you can e.g. use pgbouncer or your environment's native connection pool and open a separate connection for notifications only. allas

Queues, queues, they all fall down

2015-01-15T01:15:00.003+01:00

Roughly five years ago I presented a way of pulling items off a queue table using advisory locks. Now that the upcoming 9.5 release is going to have a similar method built in, I wanted to explore the performance characteristics of the different approaches available starting from PostgreSQL 9.5. To recap, the problem is: given a table of items (or "jobs"), distribute the items to a set

PostgreSQL gotcha of the week, week 45

2014-11-06T00:03:00.003+01:00

Something happened earlier this week (or maybe it was last week, I forget), and I was quite perplexed for a good 30 seconds, so I thought I'd try and make this problem a bit more well-known. Consider the following schema: CREATE TABLE blacklist ( person text ); INSERT INTO blacklist VALUES ('badguy'); CREATE TABLE orders ( orderid serial PRIMARY KEY, person text, amount numeric )

PostgreSQL gotcha of the week, week 19

2014-05-08T20:19:00.000+02:00

local:marko=# create table foo(); CREATE TABLE local:marko=#* insert into foo default values; INSERT 0 1 local:marko=#* create function getdata(foo) returns table (a int, b int) as $$ begin if random() < 0.5 then a := 1; b := 1; return next; else a := 2; b := 2; return next; end if ; end $$ language plpgsql ; CREATE FUNCTION local:marko=#* select (getdata(foo)).* from foo; a |

Why I consider USING harmful

2014-05-01T02:31:00.002+02:00

Often on the topic of the different JOIN syntaxes in SQL I mention that I consider ON the only reasonable way to specify JOIN clauses in production code, and I get asked for details. It always takes me a while to recall the specific problem and to come up with a plausible scenario, so I figured it would be easier to document this once and for all. Suppose you have a database containing

SQL gotcha of the week

2014-04-26T14:44:00.000+02:00

SELECT 'foo' IN ('foo' 'bar'); ?column? ---------- f (1 row) Be careful out there.

UPSERTisms in Postgres, part 2: congestion

2014-04-23T02:50:00.000+02:00

In my previous post I benchmarked a number of different UPSERTy implementations. However, a number of people (rightfully) asked me how the results would change under concurrency. So I had to come up with a way of testing that. I wanted reproducible numbers, so I figured that just throwing a lot of UPSERTs at a database was not acceptable, not to mention the problems with keeping that

UPSERTisms in Postgres

2014-04-21T03:01:00.001+02:00

As I'm sure a lot of people know already, PostgreSQL 9.4 will eventually be released without "native" support for UPSERT. That said, it's still one of the most common questions on the excellent #postgresql IRC channel, and people are sometimes tempted to try and solve it via TRIGGERs (hell, even RULEs), unsuccessfully. Even though people are often correctly told to "use a function" and

As seen at PGConf.EU: call_graph

2012-10-28T17:53:00.001+01:00

At the PostgreSQL Conference Europe 2012, Joel Jacobson was kind enough to present a project we've been working on: call_graph. For those of you who weren't at the conference, call_graph is a PostgreSQL extension which tracks the stored procedure call stack during execution, gathering that data into a table. That data can be analyzed or visualized in different ways; my attempt at

Faster count(*), part 2

2011-02-20T16:06:00.005+01:00

Previously, I introduced two ways of maintaining a materialized view to quickly return the number of rows in a table. The second solution, which is far superior when the table receives a lot of concurrent transactions, requires periodical cleanup. While writing a cron job is not very hard, it's one more thing that's not inside the database. But wouldn't it be possible to do the cleanup inside

Queues in SQL

2010-12-19T14:30:00.007+01:00

This is actually a post I should've written a long time ago, but never got to it. The problem this post is discussing is implementing a queue in SQL. There are a lot of problems and some of them aren't immediately obvious, so it's not surprising that this is a frequently asked question on the IRC channel. Let's use the following schema: CREATE TABLE items ( id serial PRIMARY KEY,

PGDay

2010-11-05T22:45:00.004+01:00

I will be talking at PGDay.eu in December about concurrency-related problems in PostgreSQL (and hopefully a few solutions too!). If your database has more than one user, this is the talk for you! I'll start from the basics, but you should be familiar with the basics of postgres (or some other SQL database).

Smallest available ID

2010-07-17T11:00:00.017+02:00

Today an interesting problem came up on the IRC channel: Given a table of integer IDs, find the first non-existing ID starting from zero. A straightforward solution using generate_series() and a left join is in the comments, but we're trying to do better than that. For this solution, there are four cases we need to worry about (and I only thought of one of these on my own, I'll blame the lack

Faster count(*)

2010-06-29T22:23:00.012+02:00

One of the most common questions on IRC is "how to make SELECT count(*) FROM tbl; faster?" While it's not clear why an application would need to run this query repeatedly, it is possible to make it a bit faster. There are two common approaches: The easiest way is to use a trigger to update a static row count in a separate table: CREATE TABLE counttbl (rowcount int NOT NULL, CHECK (

Writeable CTEs

2010-06-29T18:46:00.009+02:00

For the past year or so, I've worked on a feature known as "Writeable CTEs". While some people know what this feature is about and would want to use it right now (I'm talking about you, Merlin), there's still a lot of people who don't know what they offer. So far, I've identified two major use cases, but I'm quite sure people will find more as they wrap their heads around the feature. ;-) Let's

Introduction

2010-06-20T14:28:00.008+02:00

I have been thinking about this for quite some time now, and I finally decided to start a blog. Since most of you probably don't know me, I guess a small introduction is necessary. My name is Marko Tiikkaja. I've used PostgreSQL for the past 7 years or so and loved it more and more every year. For the past two years I've been active on freenode's #postgresql channel under the alias "johto".