3 Sources
[1]
AI outperforms law professors in blind study
The findings suggest AI could enhance legal education if integrated responsibly. A groundbreaking study led by Stanford Law School Professor Julian Nyarko reveals that law professors overwhelmingly prefer AI-generated answers to student questions over responses written by their fellow instructors - a finding that could reshape how legal education is delivered. The study, titled "Law Professors Prefer AI Over Peer Answers," was conducted with 16 law professors across U.S. law schools and tested whether large language models could serve as effective tutors for contract law courses. In a blind evaluation of nearly 3,000 anonymized comparisons, professors rated AI responses significantly higher than answers written by other professors, with AI winning 75% of head-to-head matchups. "This study challenges important assumptions about AI's role in legal education," said Nyarko, who leads Stanford Law School's Legal Innovation through Frontier Technology Lab, or liftlab. He co-authored the paper with colleagues from Yale, NYU, University of Chicago, and other leading institutions. "We focused on law precisely because it requires judgment, nuanced reasoning, and the ability to navigate ambiguity - not just factual recall." Can LLMs reason? The study is particularly notable because previous AI evaluations have focused primarily on subjects with clear right-or-wrong answers. Legal reasoning, by contrast, demands careful analysis of competing arguments and defensible conclusions. "We were frankly surprised by the magnitude of the results," Nyarko added. "These weren't just simple questions with obvious answers. Many of them required synthesizing complex material, applying it to new situations, and explaining legal concepts in ways that would help students develop their own analytical skills." Participants created 40 representative contract law questions that students might ask after class or during office hours, wrote their own answers, and then evaluated responses without knowing whether they came from AI or other participating professors. The AI systems performed comparably to the best human instructor in the study. Perhaps most striking: professors flagged AI responses as pedagogically harmful only 3.5% of the time, compared to 12% for peer-written answers. "In most fields where AI gets tested, there's a right answer. In law, there often isn't," said Sarath Sanga, co-author and professor at Yale Law School. "Two opposing arguments can both be good. What we wanted to know is whether AI can meet the latent professional standard that lawyers use to evaluate each other's arguments. In this case, the answer was yes."
[2]
AI beats law professors in Stanford tutoring study
June 2 (Reuters) - Law professors overwhelmingly preferred answers drafted by AI over ones written by fellow professors, a new Stanford Law School study, opens new tab found, suggesting that the technology is capable of legal reasoning and that law students may benefit from AI tutoring. Professors from 14 U.S. law schools developed a list of 40 questions representative of those first-year contracts students ask during faculty office hours. The professors wrote answers to the questions, and researchers had two AI platforms -- Google's Gemini 2.5 Pro and NotebookLM -- also answer them. The same professors blindly judged the short answers head-to-head and chose the AI-generated ones as most ā beneficial to students 75% of the time. The AI platforms performed just as well as the professor rated most highly in the study. "We were frankly surprised by the magnitude of the results," lead researcher and Stanford law professor Julian Nyarko said in an article on Stanford's website, opens new tab about the study. "These weren't just simple questions with obvious answers." The study comes as law schools and the legal profession grapple with how to incorporate rapidly evolving AI into teaching and law practice. Earlier studies have found that AI can pass the bar exam, earn A+ law school grades, and effectively ā grade law school exams. A growing number of law schools are mandating AI instruction in students' first year. But approaches vary. The University of California Berkeley School of Law recently adopted a new policy significantly curtailing how students may use AI in their academic work. The new tutoring study suggests AI may have benefits on ā the teaching side. Rather than relying on peers or sporadic emails to instructors to answer questions, law students could utilize AI for on-demand answers with reliable results, according to the study. Importantly, less than 4% ā of the AI-generated answers were flagged as "harmful" to student learning by the judges, compared with 12% of the professor-written answers. "We find that, when evaluated by legal educators, AI tutors can ā offer high-quality, on-demand support that complements classroom instruction, and may broaden access to expert guidance," said study co-author and Stanford researcher Alejandro Salinas in the Stanford article. AI earns high marks in law school grading Berkeley Law's AI crackdown highlights chatbot concerns Reporting by Karen Sloan Our Standards: The Thomson Reuters Trust Principles., opens new tab * Suggested Topics: * Legal Industry Karen Sloan Thomson Reuters Karen Sloan reports on law firms, law schools, and the business of law. Reach her at [email protected]
[3]
AI Lawyers Are Already Better Than Law Professors at Reasoning -- Say Law Professors
Researchers said the results show that large language models can align with professional standards. Law professors preferred answers generated by artificial intelligence over answers written by fellow professors, according to a recent study led by Stanford University that examined how large language models perform on legal reasoning tasks. In the study, 16 professors from 14 U.S. law schools -- including Stanford, Yale, New York University, the University of Chicago, Georgetown, UCLA, and the University of Virginia -- created 40 contract law questions covering legal doctrine, case law, hypotheticals, and policy issues. Researchers saw it as an ideal way to test the capabilities of modern AI. "Large language models (LLMs) are increasingly promoted as educational tutors, yet most evaluations focus on domains with a single ground truth," the researchers wrote. "Many disciplines, however, hinge on judgment: reasoning, weighing ambiguity, and reaching defensible conclusions. Law provides a sharp test." In 2,918 blinded comparisons, professors selected the answer they would rather give a student. Google's Gemini 2.5 Pro won 75.92% of its matchups against human instructors, while the tech giant's NotebookLM won 74.75% of the time, giving AI-generated results the nod over humans in roughly three-quarters of responses. According to the researchers, to determine whether the results reflected a broader professional consensus, the researchers analyzed how often professors agreed when evaluating the same answer pairs. "Observed agreement exceeded the level expected if judgments were entirely idiosyncratic, indicating that the LLMs' success reflects alignment with common disciplinary criteria," they wrote. The study found that AI models also outperformed human instructors across multiple categories, including recall questions relating to case, code, or doctrine, hypotheticals, and policy discussions. "To probe whether any LLM advantage might be driven by surface-level writing style rather than substantive content, we additionally engineered a set of lexico-syntactic features -- answer length, structural organization, reasoning nuance, legal anchors, confidence tone, clarity, and pedagogical support -- and tested how much of the preference pattern they could explain," the study said. AI-generated answers were also flagged as harmful less often than those written by professors, with Gemini recording a 3.41% harmfulness rate and NotebookLM 3.64%, compared with 12.06% for human instructors. In a separate analysis of additional models, Anthropic's Claude Opus 4.7 ranked first, followed by OpenAI's ChatGPT 5.4 and Gemini 2.5 Pro, while every AI model evaluated outperformed human instructors on average. The researchers cautioned that the study did not measure whether the answers matched each professor's individual teaching preferences, leaving open the possibility that AI-generated responses were viewed as generally acceptable rather than tailored to any one instructor's approach. "While LLM responses are generally preferred over those of human instructors, our evaluation setting does not allow us to directly measure the extent to which instructor preferences are satisfied," the study said. "It is at least theoretically possible that LLMs, although generally delivering stronger responses, still generate answers that are merely viewed as "good enough." The study comes as courts, law firms, and law schools increasingly grapple with how artificial intelligence should be used in the legal profession. In March, the Los Angeles Superior Court began testing AI tools to help judges manage growing caseloads, while law schools are adding AI training programs. "The potential benefits of these new technologies as a force multiplier in the practice of law just can't be ignored," Mississippi College School of Law Dean John P. Anderson previously told Decrypt. "Whether our students plan to be litigators or transactional attorneys, their future employers will expect familiarity with these AI tools. We want the firms hiring our students to be confident that every MC Law grad is competent in AI technologies. At the same time, however, law firms continue to confront cases undermined by hallucinations and other AI-generated errors. In April, Law firm Sullivan & Cromwell admitted to a U.S. bankruptcy court that a recent filing in a high-profile case contained fake citations generated by AI.
Share
Copy Link
A Stanford Law School study reveals that AI-generated answers to contract law questions were preferred over responses from law professors in 75% of blind comparisons. The research, involving 16 professors from 14 U.S. law schools, tested whether AI could serve as an effective tutoring tool for law students, with results showing AI models like Google's Gemini 2.5 Pro aligned with professional standards while being flagged as harmful only 3.5% of the time compared to 12% for human answers.
A Stanford Law School study led by Professor Julian Nyarko has revealed that AI outperforms law professors in answering student questions, with AI-generated answers preferred in 75% of head-to-head matchups during blind evaluation
1
. The research, titled "Law Professors Prefer AI Over Peer Answers," involved 16 law professors from 14 U.S. law schools including Stanford, Yale, NYU, University of Chicago, Georgetown, UCLA, and the University of Virginia, who conducted nearly 3,000 anonymized comparisons2
.The study tested whether large language models could serve as effective tutors for contract law courses by having professors create 40 representative questions that first-year students typically ask during office hours. These weren't simple queries with obvious answersāthey required synthesizing complex material, applying it to new situations, and explaining legal concepts in ways that would help students develop analytical skills
1
. Google's Gemini 2.5 Pro won 75.92% of its matchups, while NotebookLM achieved a 74.75% success rate, with both platforms performing comparably to the best human instructor in the study3
.
Source: Reuters
What makes this research particularly significant is that AI in legal education was tested on judgment-based reasoning rather than factual recall. "We focused on law precisely because it requires judgment, nuanced reasoning, and the ability to navigate ambiguity - not just factual recall," Nyarko explained
1
. This represents a departure from previous AI evaluations that focused primarily on subjects with clear right-or-wrong answers.Yale Law School Professor Sarath Sanga, a co-author of the study, emphasized this distinction: "In most fields where AI gets tested, there's a right answer. In law, there often isn't. Two opposing arguments can both be good. What we wanted to know is whether AI can meet the latent professional standard that lawyers use to evaluate each other's arguments. In this case, the answer was yes"
1
. The researchers found that observed agreement among professors exceeded the level expected if judgments were entirely idiosyncratic, indicating that AI's capabilities in nuanced reasoning reflect alignment with common disciplinary criteria and professional standards3
.Perhaps most striking for educators considering AI as a tutoring tool for law students is the harm assessment. Professors flagged AI-generated answers to contract law questions as pedagogically harmful only 3.5% of the time, compared to 12% for peer-written answers
1
. Gemini recorded a 3.41% harmfulness rate and NotebookLM 3.64%, while human instructors saw a 12.06% rate3
.This finding suggests that AI could provide on-demand support with reliable results rather than students relying solely on peers or sporadic emails to instructors. "We find that, when evaluated by legal educators, AI tutors can offer high-quality, on-demand support that complements classroom instruction, and may broaden access to expert guidance," said study co-author and Stanford researcher Alejandro Salinas
2
.
Source: Decrypt
Related Stories
The study arrives as law schools grapple with how to incorporate rapidly evolving AI into teaching and law practice. A growing number of law schools are mandating AI instruction in students' first year, though approaches vary significantly
2
. The University of California Berkeley School of Law recently adopted a policy significantly curtailing how students may use AI in their academic work, highlighting ongoing debates within the legal profession2
.In a separate analysis of additional models, Anthropic's Claude Opus 4.7 ranked first, followed by OpenAI's ChatGPT 5.4 and Gemini 2.5 Pro, with every AI model evaluated outperforming human instructors on average
3
. However, the researchers cautioned that the study did not measure whether answers matched each professor's individual teaching preferences, leaving open the possibility that AI-generated responses were viewed as generally acceptable rather than tailored to any one instructor's approach3
.The legal profession continues to navigate the balance between AI's potential and its risks. In March, the Los Angeles Superior Court began testing AI tools to help judges manage growing caseloads, while law firms confront cases undermined by AI hallucinations and other errors. In April, law firm Sullivan & Cromwell admitted to a U.S. bankruptcy court that a recent filing contained fake citations generated by AI
3
. As Mississippi College School of Law Dean John P. Anderson noted, "The potential benefits of these new technologies as a force multiplier in the practice of law just can't be ignored"3
.Summarized by
Navi
[1]
26 Mar 2025ā¢Technology

30 Oct 2025ā¢Policy and Regulation

10 Dec 2024ā¢Technology

1
Technology

2
Policy and Regulation

3
Policy and Regulation
