What Is a Benchmark?

Most people don’t realise that benchmarks pre-date computers by decades.

The idea comes from land surveying. Long before software or databases existed, surveyors faced a simple problem: how do you measure something today and be confident you’re measuring the same thing months or years later?

The answer was the BENCH MARK — a fixed reference point, literally set into stone, that could be returned to again and again, knowing it hadn’t moved.

That original idea still matters.


The Core of a Benchmark: Repeatability

At its heart, a benchmark is about repeatability.

If you can’t repeat a measurement and get a comparable result, you don’t really have a benchmark — you just have a number.

That’s as true for database benchmarks as it was for surveying:

  • the same workload

  • the same schema

  • the same scalability

  • the same rules

  • the same measurement method

Change those, and you’ve changed what’s being measured.

Experimentation isn’t the problem — it’s the point. A good benchmark makes experimentation meaningful.

When the benchmark is repeatable, changing a single variable — hardware, configuration, software version, or scale — means any difference you observe can reasonably be attributed to that change. Cause and effect become visible.

But without a stable reference point, you’re guessing.


“Fine or Imprisonment for Disturbing This Mark”

Many historic survey benchmarks, particularly metal markers, include a warning stamped directly into them:

“Fine or imprisonment for disturbing this mark.”

But the mark itself wasn’t valuable. However, if it could be moved, every future measurement taken from it would be suspect. Accumulated knowledge depended on that reference point remaining exactly where it was. Instead, what was valuable was trust in the measurement.


“We Only Benchmark Our Production Systems”

The problem is that production is rarely a single system.

Over time, organisations accumulate many databases, platforms, and teams. People move on, systems evolve, and each measurement captures a moment that can’t easily be compared with the next.

Each result may be valid on its own, but without a common benchmark none of them relate to each other. You don’t end up with a strategy — you end up with isolated measurements tied to individual systems and people with different skill-sets, opinions and bias.

And once that happens, you don’t really have a database strategy at all.

You end up with a sprawl: different databases, different platforms, different operating systems, some on-prem, some in the cloud. Each measured differently, at a different time. The numbers don’t line up, and there’s no consistent view of performance — or of what any of it actually costs.


Why Repeatability Unlocks Understanding

Once a benchmark is accurate and repeatable, something important changes: you can start to understand why things behave the way they do.

You can:

  • compare hardware generations

  • evaluate configuration changes

  • understand scaling behaviour

  • measure cost versus performance

  • make decisions based on evidence rather than instinct

The goal isn’t to recreate every detail of production. It’s to create a stable reference point you can return to — today, next month, or next year — and trust that the comparison still holds.


In summary

The concept of benchmarks have lasted for centuries because they’re repeatable.

Long before databases or computers, someone looked at the problem of measurement and said: “If we want to measure something time and time again, we need a fixed reference point we can trust.”

That simple idea is still the foundation of benchmarking today.

Author