Your Next Job Interviewer: An AI Agent
They Will Likely Be More Consistent and Less Biased Than Your Boss.
Large language models now improve every week. To measure this progress, researchers created a set of very challenging, cutting-edge questions from a wide range of technical fields. They called it Humanity’s Last Exam (which it is assuredly not). A month ago, the best LLMs scored 8%. This week, they scored 26%. Problem-solving is not identical to thinking, but this is remarkable progress.
AI Agents have already proven better than most humans at several valuable tasks. A partial list would include:
Driving. Swiss Re concluded that self-driving Waymos have 88% fewer accidents and 92% fewer bodily injury claims. If these results scale up, it would save tens of thousands of lives annually.1
Medical diagnosis. One small study found that LLMs are better than physicians at some medical diagnostic tasks than physicians. Oddly, they are significantly better by themselves than physicians using AI tools.
Drug discovery. LLMs are transforming drug discovery. They excel at enhancing molecular design, predicting drug interactions, and accelerating clinical trials. LLMs can analyze massive datasets, including scientific papers, patents, and clinical trial data, to identify potential drug targets and flag potential toxicities and side effects. They can find new uses for existing drugs (for example, researchers at BenevolentAI used their AI platform to determine that an approved arthritis drug, Baricitinib, would treat Covid).
Software Development. Five years ago, a Computer Science degree reliably delivered a middle-class job. In 2022, Microsoft launched GitHub Co-Pilot, which rapidly accelerated developer productivity. Since then, companies have been using AI to improve software development AIs. As this iteration loop shrinks, AI reduces the need for dedicated software developers. Despite a robust economy, demand for software developers has dropped.
We Are Bad at Hiring
American managers hire between five and six million people monthly—about a quarter million each workday—and are terrible at it. Like all of us, managers place too much value on personality, impressions from unstructured interviews, and work experience. We undervalue tests of cognitive ability, structured interviews, and work samples.2
Hiring biases are not easy for individuals to overcome. We favor people with charisma or a prestigious degree (the so-called “halo effect”). We prefer candidates who share our age, gender, background, or personality, not because we are motivated by prejudice but because we evolved in tribes. Managers make hiring decisions in the first 90 seconds of an interview – before they can make any meaningful assessment. They rate physically attractive candidates as more competent than equally skilled people. Like ordinary judges, these recruiters are less generous when hungry, after a bad night’s sleep, or if they recently fought with their partner.
Worse, we do not improve with practice. I doubt that my hundredth hire performed better than my first. We are far too likely to “trust our intuition” because we rarely measure hiring success. If nobody keeps score, managers never receive feedback on their hiring decisions.
Hiring decisions are expensive. HR professionals claimed that the direct cost of a new hire was nearly $4,700 in one survey and $2,700 in another. These surveys count only direct hiring expenses like recruiting ads, agency fees, and relocation costs. If you consider hiring manager time and productivity losses during onboarding, some employers estimate that a new hire costs three to four times the salary. If you add to this the cost of bad hires, the cost is much higher – especially in contrast with the value of getting it right.
AI Tools Help Candidates More than Employers
Can machine learning models improve this process? Not if we use historical hiring data to train AI agents. As many researchers have pointed out, AIs can easily inherit human biases. Fortunately, bias is much easier to discern in AIs than in humans.
LLMs can only make less biased decisions if they can train on well-defined job performance data. They must also prioritize skills, qualifications, and achievements over proxies like educational background, zip codes, prior employers, or preferences based on gender, race, or age. LLM hiring tools must be fully auditable to prevent discriminatory decisions based on hidden correlations in the data.
Many startups currently focus on building AI voice agents that interview candidates to assess basic skills. These tedious, high-volume, low-ROI tasks have simple performance rubrics. Automating them reduces costs, removes bottlenecks, and improves hiring velocity. A recent Andreessen Horowitz slide maps startups in the voice interviewing landscape.
Does this work? AI agents can hold credible conversations using avatars that sound like junior HR staffers. Watch a screening interview for an entry-level software developer here. The value of screening interviews like this is speed and consistency across dozens of candidates. AI agents can also adapt to accents and interview candidates in multiple languages. The latest multi-modal agents assess a candidate's energy, tone, and body language alongside their answers. They could also summarize a candidate’s public-facing blogs, social media accounts, and LinkedIn profile.
Naturally, candidates quickly adapt to all of this. They use apps to ensure that their resumes produce a green light from CV screening software. They consult Reedit boards that share typical questions posed by AI (or human) interviewers. Voice agents counsel applicants on improving their interviewing skills with other voice agents. What do candidates think? In one survey, 86% preferred AI interviews to human ones.
Will these tools improve hiring? They might. These systems are faster, more transparent, and weigh hiring factors more consistently than people do. Ideally, they free recruiters to assess and sell fewer, higher-quality candidates.
Unintended Consequences
Of course, AI-driven hiring tools will make spectacular mistakes. As with self-driving cars, these mistakes will confirm a natural human tendency to distrust them—even though they vastly outperform humans in the aggregate.
As these tools grow, they will be subject to Jevon’s Paradox, which states that demand can increase in response to a new technology that reduces costs. When LEDs that last ten times longer and use a tenth as much power made lighting cheaper, we didn’t always spend less on lighting – we often lit up more things. To the extent that we automate hiring processes, more people will be tempted to apply for more jobs. This might be a good thing, or as arguably occurred with automated college admissions applications and one-click LinkedIn job applications, it might simply drive up stress and increase system fragility.
AI hiring tools will likely have other unintended consequences, including increased salary transparency. I generally favor Norwegian levels of salary transparency because they empower candidates and reduce unwarranted pay disparities.3 LLMs make it much easier to access salary data tailored to specific roles, industries, and locations. Some employers now use LLMs to proactively surface subtle pay disparities. Recently, many states and cities have begun to require job ads to include salary information. These rules give LLMs more and better data, making employers less protective of aggregated salary information.
The quality of our hiring matters. We hire a lot, and figuring out how to do it better can increase satisfaction, raise productivity, and make pay raises possible. Ultimately, hiring is a vast matching problem. The challenge is to align the qualifications, skills, and preferences of candidates with the requirements and culture of organizations. It is a problem that LLM technology is well-suited to help, even if human biases will (and perhaps should) always complicate the process.
Motor vehicles kill 40,000 Americans each year. These dry statistics reflect the tragic loss of treasured lives. This week, an Oakland driver hit and killed Michael Burawoy, a renowned and beloved Berkeley sociologist.
One large meta-study of performance predictors found that work samples, cognitive ability tests, and structured interviews are moderate to strong predictors of job performance. Years of experience and unstructured interviews were weak predictors.
Norway makes salaries public by making tax returns open to public inspection. However, Norway also discloses to taxpayers the name of any person who asks to look at their returns. This seems to discourage the neighbors from prying.