The Technical Reason Your AI Recruiter Plays Favourites

My "rank roulette" research showed AI's inconsistency. Now we know exactly why it happens and it's a major problem for HR.

Hello H.A.I.R. Community,

A few months ago, I published my LLM Reality Check field experiment. The results were, to put it mildly, concerning. I found that when commercial large language models screen CVs, they act like "over-confident interns": fast, fluent, but shockingly inconsistent.

Candidates who were ranked #2 one day could drop to #5 the next, with no changes to the data or the prompt. I called this phenomenon "Rank Roulette". The study proved that simply plugging in a commercial AI to screen résumés is unstable and creates serious compliance risks.

The big question left hanging was why. Why are these powerful systems so erratic?

This week, we got a crystal-clear answer.

A brilliant deep-dive blog post by Thinking Machines Lab, titled "Defeating Nondeterminism in LLM Inference", provides the missing piece of the puzzle. And it confirms that the instability I measured is not a bug but a fundamental feature of how these systems currently operate.

The Real Culprit: It's Not Randomness, It's the "Batch"

The common assumption has been that AI's variability comes from something called "temperature" or simple randomness. My study, however, hinted at something deeper, suspecting the issue lay in the model's internal architecture.

The Thinking Machines post confirms this. The core issue is a technical property called "batch invariance" or rather, the lack of it.

Let me explain this with an analogy (you know I love an analogy).

Imagine you ask a navigation app for the fastest route from London to Manchester. The route it gives you should be based on traffic and road closures. It shouldn't change based on whether 10 or 10,000 other people are asking for directions at the exact same millisecond.

But with most LLMs, it does.

AI inference servers, to be efficient, "batch" multiple user requests together and process them simultaneously. The Thinking Machines research demonstrates that the size of this batch—which changes constantly depending on server load—subtly alters the mathematical calculations for every single request in the batch.

The result? The exact same query, sent twice, can end up in two different batches of different sizes, leading to two slightly different outcomes. When you are ranking candidates, those "slight differences" are the reason a top prospect can vanish from a shortlist without a trace.

Connecting the Technical Theory to HR Reality

This technical explanation directly validates the practical problems uncovered in my LLM Reality Check study:

  • It Explains "Rank Roulette": The ±2.5 rank volatility I observed is a direct result of this lack of batch invariance. A candidate's CV processed on Monday was in a different server batch than on Tuesday, leading to a different rank.

  • It Confirms the Compliance Risk: This isn't just a technical quirk; it's a governance nightmare. If a candidate's ranking depends on unpredictable server load, how can you possibly defend your process as fair, consistent, or transparent under the EU AI Act or GDPR? You can't. The process is, by its very nature, unstable.

  • It Vindicates the "Copilot" Approach: This is definitive proof that treating AI as an autonomous "set it and forget it" gatekeeper is reckless. The technology's core instability means human oversight and robust guardrails are not optional, they are essential.

What This Means For You as an HR Leader

This gets to the heart of what we discuss at H.A.I.R.: moving beyond the hype to build a defensible AI strategy. The "over-confident intern" needs guardrails.

Here are the key takeaways:

  1. Question Your Vendors: You now have a powerful, specific question for any AI vendor in the recruitment space: "Is your inference process deterministic and batch-invariant?" Ask for proof. Their answer (or lack thereof) will tell you everything you need to know about their commitment to responsible AI.

  2. Reject "Black Box" Excuses: When a tool produces an odd result, vendors often blame the inherent "magic" of AI (or the prompt). We now know that's not the full story. The instability is often a result of specific, controllable engineering choices that prioritise raw speed over consistency.

  3. Prioritise a "Controlled Copilot" Model: The only safe way to use this technology is as a human-in-the-loop assistant. Use it to summarise, draft, and triage—but never to make final, automated decisions without human sign-off. My research outlined a checklist for this, including version pinning and variance monitoring, which are more critical than ever.

The work by Thinking Machines is a vital contribution to our field. It gives us the technical language to describe the instability many of us have sensed, and it reinforces the urgent need for robust governance. We must deploy AI at the speed of innovation, but govern it at the speed of risk.Comment

Eunomia HR Relaunch

I’ve relaunched Eunomia HR with a sharper focus:
1️⃣ AI in HR Risk & Compliance Assessments
2️⃣ Fractional AI Governance

To mark it, I’m also sharing free resources, including the Universal Framework for AI in HR, a practical model to help HR leaders adopt AI responsibly and legally.

Here's how H.A.I.R. can help you put the AI in HR:

  1. H.A.I.R. Newsletter: get authoritative, pragmatic, and highly valuable insights on AI in HR directly to your inbox. Subscribe now.

  2. AI Governance QuickScore Assessment: understand your organisation's EU AI Act readiness in minutes and identify key areas for improvement. Take your QuickScore here.

  3. H.A.I.R. Training Courses: enhance your team's AI literacy and readiness with our practical training programmes. Explore courses.

  4. Measure Your Team's AI Readiness with genAssess: stop guessing and start measuring your team's practical AI application skills. Discover genAssess.

Thank you for being part of H.A.I.R. I hope this deep dive helps you navigate the complexities of AI in HR with greater confidence and control.

Until next time,

H.A.I.R. (AI in HR) 

Putting the AI in HR. Safely.

Reply

or to participate.