In my previous post I benchmarked a number of different UPSERTy implementations. However, a number of people (rightfully) asked me how the results would change under concurrency. So I had to come up with a way of testing that.
I wanted reproducible numbers, so I figured that just throwing a lot of UPSERTs at a database was not acceptable, not to mention the problems with keeping that up while maintaining that a certain number of rows pre-exist. Instead, I changed the implementation a bit: with a small change I persuaded the function to always take the code path which would be taken if a concurrent transaction was UPSERTing on the exact same value at the same time. I only looked at INSERT or UPDATE operations, and only methods one and two, since they looked the most promising in my previous tests.
I wrote a
which tested the various methods, and got the following results:
InsertOrUpdate1 10000 130578 ns/op InsertOrUpdate1RowExists 10000 111433 ns/op InsertOrUpdate1UnderCongestion 10000 176165 ns/op InsertOrUpdate2 10000 108636 ns/op InsertOrUpdate2RowExists 10000 152281 ns/op(There isn't a version of the second method for the "congestion" code path, as that's the exact same code that is executed when the row already exists.)
So now I knew the penalty the transaction forced onto the slow code path was taking.
To get a meaningful graph out of it, I used the following observations:
- If the row already exists, the number of concurrent transactions doesn't matter.
- If c transactions try to concurrently UPSERT a row that does not exist, the one that "wins the race" is as fast as the operation normally would have been, disregarding the concurrent transactions. The rest, c - 1 transactions, are as slow as InsertOrUpdate1UnderCongestion or InsertOrUpdate2RowExists for methods one and two, respectively.
The result was the following graph:
So even under concurrency the first method appears to be faster in applications where more than 30-40% of the time the target row already exists. This matches the results of the previous benchmark.
Update: Fixed some calculations for the second method; I was not charging it any penalty for congestion, which was wrong.