H.A.I.R - AI in HR
Posts
Unpacking AI Bias: New Research Reinforces My Reality Check

Unpacking AI Bias: New Research Reinforces My Reality Check

Your AI Recruiter? Turns out they're more like over-confident interns – brilliant at drafting, but worryingly inconsistent and prone to hidden biases.

Martyn Redstone
July 22, 2025

Hello H.A.I.R. Community,

We've all heard the buzz around AI in HR, especially when it comes to sifting through hundreds of CVs in seconds. It sounds like the dream solution, doesn't it? Cutting through the noise to find the perfect candidate. However, my recent "LLM Reality Check" report highlighted a less glamorous truth: off-the-shelf Large Language Models (LLMs) used for CV screening behave more like over-confident interns – fast and smooth, but shockingly inconsistent. Recently, new research has surfaced that not only echoes my findings on instability but also reveals unsettling biases that should make every HR and Talent Acquisition leader pause and reflect.

Now, onto the good stuff…

A study titled "Gender and Positional Biases in LLM-Based Hiring Decisions: Evidence from Comparative CV/Résumé Evaluations" by David Rozado, published in May 2025, presents compelling evidence of inherent biases in LLM-based hiring decisions. This research resonates deeply with my core mission at H.A.I.R.: to empower the human side of AI transformation by providing realistic, practical, and defensible guidance.

1/ Do AI systems discriminate based on gender when choosing the most qualified candidate for a job? I ran an experiment with several leading LLMs to find out. Here's what I discovered:👇
— David Rozado (@DavidRozado)
11:00 PM • May 19, 2025

What the New Research Reveals

Rozado's experiment involved 22 leading LLMs evaluating professional candidates based on CVs. Each model was given a job description and a pair of identical CVs, with only the first names signalling gender, and asked to select the more suitable candidate. The findings were striking:

Consistent Female Favouritism: Despite identical qualifications, all LLMs consistently favoured female-named candidates across 70 different professions. This preference was even more pronounced when an explicit gender field (male/female) was added to the CVs.
Positional Bias: Beyond gender, most models showed a substantial bias towards selecting the candidate listed first in the prompt. This "first in, first chosen" dynamic reveals a worrying lack of principled reasoning.
Pronoun Influence: Interestingly, including preferred pronouns (he/him or she/her) slightly increased the odds of a candidate being selected, regardless of their gender.
Masked Gender Parity: When gendered names were replaced with neutral identifiers ("Candidate A" and "Candidate B") and gender assignments were counterbalanced, candidate selections achieved gender parity across all models. This suggests that the overt gender cues were indeed driving the bias.

Connecting the Dots: Bias and Instability

This research on gender and positional biases aligns perfectly with the "Instability is a compliance risk" message from my "LLM Reality Check" report. My study revealed that commercial LLMs produced only 14% agreement on shortlisted CVs and showed a rank drift of

∓2.5 positions daily, meaning a candidate ranked #2 yesterday could be #5 today, with no new information. This volatility and lack of consistent reasoning are precisely what Rozado's study highlights with its findings on gender and positional biases.

If LLMs are consistently favouring candidates based on gendered names or their position in a list, rather than solely on merit, it compounds the problem of inconsistent outcomes. As Rozado rightly points out, claims of "bias-free insights" from AI systems appear "questionable" in light of these findings. Our collective findings underscore that LLMs are not reasoning from first principles; instead, they are exhibiting behaviours that depart from conventional expectations of fairness.

The Path Forward: Guardrails and Governance

Both studies lead to the same crucial conclusion: treating LLMs as autonomous decision-makers in high-stakes contexts like hiring is premature and risky. The "Over-Confident Intern" needs guardrails.

As I advocate in my report, a "Controlled Copilot" approach is essential. This involves:

Human-in-the-Loop: LLMs should augment, not replace, human judgment.
Deterministic Behaviour: Striving for programmatic API calls at a temperature of 0 can help reduce output variability.
Transparency and Auditability: We need to ensure that the process is transparent and that there's an audit trail, especially given the "invisible disqualifiers" where 55% of CVs never surfaced in my study.
Robust Governance: This includes version pinning of models, continuous variance monitoring, shadow auditing, and clear human override protocols.

The findings from Rozado's research provide further compelling evidence for why HR and TA leaders must remain cautious and strategic when integrating AI into their processes. Deploy AI at the speed of innovation, but govern it at the speed of risk.

Your Call to Action

Understanding these biases is the first step towards building responsible AI frameworks.

What are your thoughts on these findings? Have you observed similar subtle biases in your AI tools?

A First for H.A.I.R.: Public AI Masterclasses

Something I don't normally do...

My three-hour AI workshops are usually reserved for private corporate teams. But after continuous requests, I'm opening up my calendar for a limited number of public sessions this August for the very first time.

These aren't one-hour overview webinars. They are comprehensive, capability-building sessions designed for individual HR and Talent Acquisition professionals. To ensure a high-quality, interactive experience, seats are strictly limited to just 20 per workshop.

Choose the track that's right for you:

Track 1: For Recruiters & TA Professionals: The AI-Powered Recruiter Workshop This is a practical deep dive into the "how". We'll move beyond basic prompting to build the skills you need to work faster, smarter, and more strategically. You will leave having mastered the PRIME framework in a hands-on session. (Dates: 5th & 26th August)

Track 2: For HR Directors & People Leaders: The AI Readiness Workshop This is a strategic session focused on the "why" and "what". We'll cover your role as the "Ethical Guardian", build "Guardrails" for your organisation, and develop a responsible AI roadmap. (Dates: 7th & 28th August)

If you're ready to move beyond the hype and build real, practical AI skills, this is your chance. Places are offered on an approval basis to ensure the right mix of professionals.

I encourage you to delve into David Rozado's preprint – it's an important read for anyone serious about AI governance in HR. You can find it on ResearchGate.

And if you haven't already, please download my "LLM Reality Check" report to get the full picture on the instability of off-the-shelf LLMs in CV screening.

Let's continue to empower the human side of AI transformation together.

Martyn

H.A.I.R. (AI in HR)

Putting the AI in HR. Safely.

Reply

or to participate.