In my previous post I benchmarked a number of different UPSERTy implementations. However, a number of people (rightfully) asked me how the results would change under concurrency. So I had to come up with a way of testing that.
I wanted reproducible numbers, so I figured that just throwing a lot of UPSERTs at a database was not acceptable, not to mention the problems with keeping that up while maintaining that a certain number of rows pre-exist. Instead, I changed the implementation a bit: with a small change I persuaded the function to always take the code path which would be taken if a concurrent transaction was UPSERTing on the exact same value at the same time. I only looked at INSERT or UPDATE operations, and only methods one and two, since they looked the most promising in my previous tests.
I wrote a
Go program
which tested the various methods, and got the following results:
InsertOrUpdate1 10000 130578 ns/op InsertOrUpdate1RowExists 10000 111433 ns/op InsertOrUpdate1UnderCongestion 10000 176165 ns/op InsertOrUpdate2 10000 108636 ns/op InsertOrUpdate2RowExists 10000 152281 ns/op(There isn't a version of the second method for the "congestion" code path, as that's the exact same code that is executed when the row already exists.)
So now I knew the penalty the transaction forced onto the slow code path was taking.
To get a meaningful graph out of it, I used the following observations:
- If the row already exists, the number of concurrent transactions doesn't matter.
- If c transactions try to concurrently UPSERT a row that does not exist, the one that "wins the race" is as fast as the operation normally would have been, disregarding the concurrent transactions. The rest, c - 1 transactions, are as slow as InsertOrUpdate1UnderCongestion or InsertOrUpdate2RowExists for methods one and two, respectively.
The result was the following graph:
So even under concurrency the first method appears to be faster in applications where more than 30-40% of the time the target row already exists. This matches the results of the
previous benchmark.
Update: Fixed some calculations for the second method; I was not charging it any penalty for congestion, which was wrong.
What do you think is the best way for a table used for rate-limiting api_requests ?
ReplyDeleteI'd say not to use a table for that at all; something like Redis would seem to be a much better fit, and keeps all the load off your DB (presumably a goal, if not the primary goal, of the rate-limiting).
DeleteHi, thx for great posts. Did you test how different transaction Isolation levels affect on upsert operations? What level was used in benchmark?
ReplyDeleteEverything was tested in READ COMMITTED. I don't think any of the approaches work correctly under REPEATABLE READ or SERIALIZABLE. They probably all have a corner case where they just loop forever.
Delete