How do I benchmark a spin lock?

Wed Nov 26 10:43:39 IST 2014

On Tue, Nov 25, 2014 at 08:56:01PM +0000, Elazar Leibovich wrote:

> The first question I have in mind, is, how do you define a lock
> benchmark?  Is your goal to minimize overhead? Is your goal to
> minimize the latency of a successful uncontended acquire? Is your
> goal to minimize bus load for other CPU when three CPUs are waiting
> for the spin lock?

Good questions all, but usually you want to focus on the important
case first. I would argue that for a spinlock, the most important case
is an uncontended acquire, since if most acquires are contended, then
you should redesign the code that uses the lock. The second most
important case is the contended acquire/contended release.

> What we're measuring is not well defined.
> 
> Looking at [0] (wow, So little have changed in SMP architectures since
> 1990!) and [1], gives a few options:
> 
> 1) Measure the time it takes to perform a critical section *N* times
> by *n *CPUs
> concurrently.

This is measuring the contended acquire/contended release.

> 2) Measure overhead. Compare the time it takes to a single CPU to do a task
> *N* times with no locks. Do that with *n* CPUs, where the task is protected
> by a spin lock. The time it took to do the task with spinlock, minus the
> time it took to do it alone is the overhead.

... is the overhead for *N* CPUs on this architecture. This is again
measuring the contended case.

> 3) Measure the overhead of other *m* CPUs doing a different task,
> while *n* CPUs are contending over a critical section.

This is measuring the interference caused by the bus traffic due to
the contention. It's interesting, but it's hard to know just form this
benchmark what is reasonable and what isn't, and how to improve the
results. A benchmark whose results aren't actionable is not as useful
as a benchmark where it's immediately obvious what you need to fix.

> Is there an industry standard for benchmarking a spinlock? Something
> like JS octane. A benchmark which should mimic many real world
> scenarios?

Not as far as I know. Locks are usually benchmarked using a
micro-benchmark that performs many acquires/releases in a loop with
exact details depending on how the lock is likely to be used "in
anger". Before benchmarking, you should first ask yourself
(1) how is this lock implementation likely to be used? (2) what are
the fast-path, common operations, and what are the slow-paths,
uncommon operations? And then benchmark the fast-path operations in a
setup that mimics how the implementation is likely to be used.

Cheers,
Muli