AI outperforms law professors 75% of the time in Stanford Law School study on legal reasoning

Reviewed byNidhi Govil

3 Sources

Share

A Stanford Law School study reveals that AI-generated answers to contract law questions were preferred over responses from law professors in 75% of blind comparisons. The research, involving 16 professors from 14 U.S. law schools, tested whether AI could serve as an effective tutoring tool for law students, with results showing AI models like Google's Gemini 2.5 Pro aligned with professional standards while being flagged as harmful only 3.5% of the time compared to 12% for human answers.

AI Demonstrates Superior Performance in Legal Education Research

A Stanford Law School study led by Professor Julian Nyarko has revealed that AI outperforms law professors in answering student questions, with AI-generated answers preferred in 75% of head-to-head matchups during blind evaluation

1

. The research, titled "Law Professors Prefer AI Over Peer Answers," involved 16 law professors from 14 U.S. law schools including Stanford, Yale, NYU, University of Chicago, Georgetown, UCLA, and the University of Virginia, who conducted nearly 3,000 anonymized comparisons

2

.

The study tested whether large language models could serve as effective tutors for contract law courses by having professors create 40 representative questions that first-year students typically ask during office hours. These weren't simple queries with obvious answers—they required synthesizing complex material, applying it to new situations, and explaining legal concepts in ways that would help students develop analytical skills

1

. Google's Gemini 2.5 Pro won 75.92% of its matchups, while NotebookLM achieved a 74.75% success rate, with both platforms performing comparably to the best human instructor in the study

3

.

Source: Reuters

Source: Reuters

AI Legal Reasoning Challenges Traditional Assumptions

What makes this research particularly significant is that AI in legal education was tested on judgment-based reasoning rather than factual recall. "We focused on law precisely because it requires judgment, nuanced reasoning, and the ability to navigate ambiguity - not just factual recall," Nyarko explained

1

. This represents a departure from previous AI evaluations that focused primarily on subjects with clear right-or-wrong answers.

Yale Law School Professor Sarath Sanga, a co-author of the study, emphasized this distinction: "In most fields where AI gets tested, there's a right answer. In law, there often isn't. Two opposing arguments can both be good. What we wanted to know is whether AI can meet the latent professional standard that lawyers use to evaluate each other's arguments. In this case, the answer was yes"

1

. The researchers found that observed agreement among professors exceeded the level expected if judgments were entirely idiosyncratic, indicating that AI's capabilities in nuanced reasoning reflect alignment with common disciplinary criteria and professional standards

3

.

Lower Harm Rates Signal Pedagogical Reliability

Perhaps most striking for educators considering AI as a tutoring tool for law students is the harm assessment. Professors flagged AI-generated answers to contract law questions as pedagogically harmful only 3.5% of the time, compared to 12% for peer-written answers

1

. Gemini recorded a 3.41% harmfulness rate and NotebookLM 3.64%, while human instructors saw a 12.06% rate

3

.

This finding suggests that AI could provide on-demand support with reliable results rather than students relying solely on peers or sporadic emails to instructors. "We find that, when evaluated by legal educators, AI tutors can offer high-quality, on-demand support that complements classroom instruction, and may broaden access to expert guidance," said study co-author and Stanford researcher Alejandro Salinas

2

.

Source: Decrypt

Source: Decrypt

Implications for Law Schools and the Legal Profession

The study arrives as law schools grapple with how to incorporate rapidly evolving AI into teaching and law practice. A growing number of law schools are mandating AI instruction in students' first year, though approaches vary significantly

2

. The University of California Berkeley School of Law recently adopted a policy significantly curtailing how students may use AI in their academic work, highlighting ongoing debates within the legal profession

2

.

In a separate analysis of additional models, Anthropic's Claude Opus 4.7 ranked first, followed by OpenAI's ChatGPT 5.4 and Gemini 2.5 Pro, with every AI model evaluated outperforming human instructors on average

3

. However, the researchers cautioned that the study did not measure whether answers matched each professor's individual teaching preferences, leaving open the possibility that AI-generated responses were viewed as generally acceptable rather than tailored to any one instructor's approach

3

.

The legal profession continues to navigate the balance between AI's potential and its risks. In March, the Los Angeles Superior Court began testing AI tools to help judges manage growing caseloads, while law firms confront cases undermined by AI hallucinations and other errors. In April, law firm Sullivan & Cromwell admitted to a U.S. bankruptcy court that a recent filing contained fake citations generated by AI

3

. As Mississippi College School of Law Dean John P. Anderson noted, "The potential benefits of these new technologies as a force multiplier in the practice of law just can't be ignored"

3

.

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
Ā© 2026 TheOutpost.AI All rights reserved