- H.A.I.R - AI in HR
- Posts
- Disposition Data: The Asset That Became a Liability
Disposition Data: The Asset That Became a Liability
Why using historical hiring outcomes to train AI is a regulatory dead end. Plus, how to define 'proxy bias'.

Hello H.A.I.R. Community,
The modern Human Resources technology landscape is currently driven by a singular market thesis: that the historical record of human hiring decisions - the "disposition data" residing in vast, dormant Applicant Tracking System (ATS) archives - constitutes the essential fuel for the next generation of AI agents.
The theory posits that by ingesting millions of data points where Candidate A was hired and Candidate B was rejected, commercial AI models can learn to replicate the intuition of expert recruiters.
However, my extensive analysis of the current regulatory landscape across the EU, UK, and US suggests this narrative is fundamentally flawed. As detailed in my recent research report, strictly enforced compliance frameworks are transforming what was once considered "digital exhaust" into a significant legal risk.
Here is the forensic case against the use of raw disposition data in recruitment AI, based on my review of the EU AI Act, GDPR, UK Data Use and Access Bill, and emerging US employment laws.
1. The Data Quality Problem: "Ghosting" is Now a Compliance Violation
Primary Blocker: EU AI Act, Article 10
The most immediate barrier to using historical data is the legal requirement for data quality. The EU AI Act mandates that datasets used to train High-Risk AI Systems (which include recruitment tools) must be "relevant, sufficiently representative, and to the best extent possible, free of errors".
From a data science perspective, an "error" includes "label noise" - instances where the outcome recorded in the system contradicts reality.
Consider the common phenomenon of "ghosting." Research indicates that a significant percentage of job seekers report being ignored by recruiters. When a qualified candidate applies during a high-volume period and is never reviewed, they are often auto-dispositioned as "Rejected" or left in a null state.
To an AI model, this is a signal. The system analyses the profile of this qualified candidate and associates their characteristics with failure.
Under the EU AI Act, using such data constitutes a statutory violation. By training on raw data containing these "false negatives," vendors fail the "free of errors" requirement. They are not automating intelligence; they are automating historical process failures. The cost to audit and clean this data to regulatory standards is likely prohibitive.
2. The Purpose Limitation Problem: You Can’t Sell What You Don’t Own
Primary Blocker: GDPR (UK & EU), Article 5(1)(b)
Even if the data were clean, vendors face a formidable legal barrier regarding ownership and usage rights.
Candidates submit their personal data for a specific, singular purpose: to apply for a specific job. Harvesting that data to train a commercial AI model for sale to other companies is a distinct, secondary purpose.
Vendors often rely on "Legitimate Interest" to justify this processing without asking for consent. However, our research indicates this legal basis is unstable. The "Balancing Test" required by GDPR asks whether the commercial interest overrides the candidate's rights. Regulatory precedents suggest that applicants do not have a reasonable expectation that their CVs will be used to train commercial models years later.
Furthermore, while new legislation like the UK Data Use and Access Bill aims to support innovation, it does not provide a blank check. "Training Commercial AI" is notably absent from the Bill's new list of "Recognised Legitimate Interests". This leaves vendors in a precarious position, relying on legal arguments that are vulnerable to enforcement action.
3. The Automation Problem: The End of the 'Rubber Stamp'
Primary Blocker: UK Data Use and Access Bill, Section 80 (New Art. 22A)
For years, many vendors have avoided strict "Automated Decision Making" regulations by keeping a human recruiter "in the loop" to finalise decisions. The argument has been that the AI provides decision support, not the decision itself.
This defence is becoming untenable.
Academic studies cited in our report demonstrate that human recruiters accept AI recommendations at rates as high as 90%, even when the AI is visibly biased. This "Automation Bias" renders the human involvement legally meaningless.
The UK Data Use and Access Bill clarifies that a decision is considered "solely automated" if there is no meaningful human involvement. If a human recruiter merely rubber-stamps an agent's output, the system triggers strict safeguards, including the candidate’s right to contest the decision and demand human intervention.
If an AI agent screens thousands of applicants and the employer is legally required to offer a human appeals process for every rejection, the efficiency gains of the technology are effectively negated.
4. The "Controller" Trap: How Vendors Are Weaponising Terms of Service
Primary Blocker: Data Processing Agreements & Contract Law
Perhaps the most dangerous development revealed in our research is not in the legislation, but in the Terms of Service updates currently rolling out across the HR technology sector.
Major platforms and aggregators (such as Indeed) are engaging in a sophisticated legal manoeuvre to secure the asset (the data) while outsourcing the liability (the risk). My analysis of recent vendor terms reveals a standardised "two-step" clause structure:
The Liability Shift: Vendors are explicitly classifying disposition data (hiring decisions, rejection notes, status updates) as "Employer Materials." By doing so, they legally designate the Employer as the sole "Data Controller." This means if the historical data is biased, the Employer, not the Vendor, is liable for any resulting discrimination claims or regulatory fines. The Vendor claims they are merely a "mechanical transmitter."
The Licensing Loophole: In the same document, these vendors often classify the transmission of that data as "User Content." This grants the Vendor a perpetual, worldwide, royalty-free license to use that data to "improve services" - a legal euphemism for training their commercial AI models.
The result is a regulatory paradox: The Vendor extracts the commercial value ("The Oil") to build their Agentic AI, while the Employer is left holding the compliance bag ("The Toxic Waste").
Employers engaging with Agentic AI must now scrutinise their Data Processing Agreements. If you are feeding disposition data back to a platform, ensure you are not inadvertently underwriting the compliance risk for a product they will sell back to you.
5. The Liability Shift: Vendors Can No Longer Hide
Primary Blocker: NYC Local Law 144 & California Draft Employment Regulations
Perhaps the most significant shift is occurring in the United States, where regulations are piercing the corporate veil between software vendors and employers.
NYC Local Law 144 mandates Bias Audits. If a vendor trains a model on biased historical data, the audit will likely reveal disparate impact ratios, rendering the tool commercially unsellable.
California’s Draft Regulations propose redefining "Employment Agency" to include any entity that procures applicants through an automated-decision system.
This creates Joint Liability. Vendors can no longer claim they simply provide the software while the employer makes the decisions. If the model discriminates because it was trained on biased disposition data, the vendor can face direct legal action.
Final Thoughts on Disposition Data
The prevailing market assumption that historical disposition data is a low-cost asset ignores the high cost of regulatory compliance.
Our research concludes that the "meaning" currently encoded in ATS archives is often a record of human bias, process inefficiency, and error, not a reliable record of talent.
To survive this new regulatory environment, the industry must move beyond the indiscriminate harvesting of historical data. In my opinion, the future lies in the use of Synthetic Data, Consent-Based Data Collectives, and strict adherence to data governance standards. The risk of holding "toxic" historical data now far outweighs its potential value.
How we define ‘Proxy Bias’:
Since releasing my analysis on the structural risks in LinkedIn’s recommendation architecture, the most common question I’ve received is: “What exactly is proxy bias?”
Proxy bias occurs when an AI model uses "neutral" data points, such as postcodes, employment gaps, or specific linguistic patterns, as stand-ins for protected characteristics like race, gender, or disability. Even if you explicitly remove sensitive demographic labels to "blind" your hiring process, the algorithm often detects correlations in the remaining data, allowing it to reconstruct the very identities you tried to hide and replicate the bias you sought to eliminate.

Until next time,
H.A.I.R. (AI in HR)
Putting the AI in HR. Safely.
Reply