Google Vantage Puts AI Avatars in Job Interviews — Scores Soft Skills at Human Expert Level

Google Research published results from Vantage, now live at labs.google/vantage — a system that places learners in AI-avatar conversations to assess collaboration, creativity, and critical thinking at scale. An NYU study showed AI scoring achieved 0.88 correlation with human expert scores on creative writing tasks and matched inter-rater agreement levels. The tool is moving from research into real hiring and education contexts.

Original source

Google Research published a detailed writeup on April 13 describing Vantage, which is now publicly accessible at labs.google/vantage. The system places users in realistic AI-avatar conversations designed to surface soft skills that traditional assessments can't measure: collaboration style, creative problem-solving, and critical thinking under ambiguity.

The architecture has three layers. An Executive LLM steers the simulation in real time, adapting the conversation based on how the subject responds. A separate AI Evaluator scores performance against pedagogical rubrics calibrated to the skill being assessed. A third component generates the avatar behavior and dialogue, maintaining consistent personas across the interaction.

The research results are striking. In an NYU study, the AI Evaluator's scores showed 0.88 correlation with human expert raters on creative writing tasks — a level that matches or exceeds the agreement between two different human experts scoring the same work. For collaboration assessment, AI scoring matched human inter-rater agreement on dimensions like "active listening" and "building on others' ideas."

The practical implications are significant. Standardized soft-skill assessment has resisted automation for decades because rubric-based scoring of open-ended behavior seemed to require human judgment. Vantage's results suggest that problem is closer to solved than the field acknowledged.

Google is positioning Vantage for two immediate applications: education (helping students practice and receive feedback on collaborative skills) and hiring (offering structured, comparable assessments that are harder to game than traditional interviews). The fact that it's live on Google Labs — not just a paper — signals this is being treated as a near-term product, not a research artifact.

Panel Takes

The Builder

Developer Perspective

“0.88 correlation with human expert scoring is the number that matters here. If it holds across diverse populations and skill areas — not just the NYU study's sample — you have a scalable assessment primitive that previously required human evaluators at every touchpoint. The API implications for HR software and edtech are obvious.”

The Skeptic

Reality Check

“A single NYU study on creative writing with an unknown sample size is not sufficient validation for high-stakes hiring decisions. AI assessments can also be gamed once people learn the rubrics, and the system likely has demographic biases baked into its evaluation model. Deploying this in hiring before independent auditing could cause real harm to job seekers.”

The Futurist

Big Picture

“Scalable soft-skill assessment is a foundational piece of the AI-native education and talent stack. When every student can get real-time feedback on their collaboration and communication skills — not just their test scores — the compounding effects on workforce development are enormous. This is the kind of infrastructure that reshapes outcomes over a decade.”

Panel Takes

Bookmarks