johto's lair: UPSERTisms in Postgres, part 2: congestion

In my previous post I benchmarked a number of different UPSERTy implementations. However, a number of people (rightfully) asked me how the results would change under concurrency. So I had to come up with a way of testing that.

I wanted reproducible numbers, so I figured that just throwing a lot of UPSERTs at a database was not acceptable, not to mention the problems with keeping that up while maintaining that a certain number of rows pre-exist. Instead, I changed the implementation a bit: with a small change I persuaded the function to always take the code path which would be taken if a concurrent transaction was UPSERTing on the exact same value at the same time. I only looked at INSERT or UPDATE operations, and only methods one and two, since they looked the most promising in my previous tests.

I wrote a Go program which tested the various methods, and got the following results:

InsertOrUpdate1                10000 130578 ns/op
InsertOrUpdate1RowExists       10000 111433 ns/op
InsertOrUpdate1UnderCongestion 10000 176165 ns/op
InsertOrUpdate2                10000 108636 ns/op
InsertOrUpdate2RowExists       10000 152281 ns/op

(There isn't a version of the second method for the "congestion" code path, as that's the exact same code that is executed when the row already exists.)

So now I knew the penalty the transaction forced onto the slow code path was taking. To get a meaningful graph out of it, I used the following observations:

If the row already exists, the number of concurrent transactions doesn't matter.
If c transactions try to concurrently UPSERT a row that does not exist, the one that "wins the race" is as fast as the operation normally would have been, disregarding the concurrent transactions. The rest, c - 1 transactions, are as slow as InsertOrUpdate1UnderCongestion or InsertOrUpdate2RowExists for methods one and two, respectively.

The result was the following graph:

So even under concurrency the first method appears to be faster in applications where more than 30-40% of the time the target row already exists. This matches the results of the previous benchmark.

Update: Fixed some calculations for the second method; I was not charging it any penalty for congestion, which was wrong.

4 comments:

AnonymousMay 1, 2014 at 4:50 PM
What do you think is the best way for a table used for rate-limiting api_requests ?
AndrewJanuary 14, 2015 at 11:38 AM
Hi, thx for great posts. Did you test how different transaction Isolation levels affect on upsert operations? What level was used in benchmark?

Wednesday, April 23, 2014

UPSERTisms in Postgres, part 2: congestion

4 comments: