2 Sources
2 Sources
[1]
Anthropic has to keep revising its technical interview test so you can't cheat on it with Claude
Since 2024, Anthropic's performance optimization team has given job applicants a take-home test to make sure they know their stuff. But as AI coding tools have gotten better, the test has had to change a lot to stay ahead of AI-assisted cheating. Team lead Tristan Hume described the history of the challenge in a blog post on Wednesday. "Each new Claude model has forced us to redesign the test," Hume writes. "When given the same time limit, Claude Opus 4 outperformed most human applicants. That still allowed us to distinguish the strongest candidates -- but then, Claude Opus 4.5 matched even those." The result is a serious candidate-assessment problem. Without in-person proctoring, there's no way to ensure someone isn't using AI to cheat on the test -- and if they do, they'll quickly rise to the top. "Under the constraints of the take-home test, we no longer had a way to distinguish between the output of our top candidates and our most capable model," Hume writes. The issue of AI cheating is already wreaking havoc at schools and universities around the world, so ironic that AI labs are having to deal with it too. But Anthropic is also uniquely well-equipped to deal with the problem. In the end, Hume designed a new test that had less to do with optimizing hardware, making it sufficiently novel to stump contemporary AI tools. But as part of the post, he shared the original test to see if anyone reading could come up with a better solution. "If you can best Opus 4.5," the post reads, "we'd love to hear from you."
[2]
Anthropic overhauls hiring tests due to Claude AI
Anthropic has repeatedly revised its take-home technical interview test for job applicants since 2024 to mitigate AI-assisted cheating. The performance optimization team, responsible for administering the test, found that advancements in AI coding tools necessitated these changes. Team lead Tristan Hume stated in a Wednesday blog post that each new Claude model has compelled test redesigns. Claude Opus 4 surpassed most human applicants when given the same time limit, while Claude Opus 4.5 matched the performance of top candidates. This eliminated Anthropic's ability to differentiate between the work of leading human applicants and its most advanced AI model under the take-home test conditions. Hume developed a new test focusing less on hardware optimization, making it sufficiently complex to challenge current AI tools. The original test was also shared in the blog post, inviting readers to propose alternative solutions. The post indicated, "If you can best Opus 4.5, we'd love to hear from you."
Share
Share
Copy Link
Anthropic has been forced to repeatedly revise its technical interview test since 2024 as its own AI models have grown powerful enough to outperform human applicants. Claude Opus 4.5 now matches even the strongest candidates, creating a serious challenge for distinguishing genuine talent from AI-assisted submissions in take-home assessments.
Anthropic has encountered an ironic dilemma that highlights the rapid advancement of AI coding tools: its own AI models have become so capable that they're undermining the company's ability to evaluate human candidates. Since 2024, the performance optimization team at Anthropic has administered a take-home test to job applicants, but each iteration of Claude AI has forced the company to redesign technical assessments to stay ahead of AI-assisted cheating
1
2
.Team lead Tristan Hume described the escalating challenge in a blog post published Wednesday, explaining how the company's hiring test has evolved alongside its AI capabilities. "Each new Claude model has forced us to redesign the test," Hume wrote, underscoring the relentless pace at which AI labs must adapt their recruitment strategies
1
.The progression of Claude's capabilities tells a striking story about AI advancement. When given the same time limit as human applicants, Claude Opus 4 outperformed most candidates, though it still allowed Anthropic to identify the strongest performers. However, Claude Opus 4.5 raised the stakes considerably by matching even those top-tier candidates, creating what Hume describes as a serious candidate-assessment problem
1
2
.
Source: TechCrunch
"Under the constraints of the take-home test, we no longer had a way to distinguish between the output of our top candidates and our most capable model," Hume explained in the blog post. Without in-person proctoring, there's simply no reliable method to ensure job applicants aren't leveraging AI to complete the assessment, and those who do will inevitably rise to the top of the candidate pool
1
.To address this challenge, Hume developed a new test that shifted focus away from hardware optimization, making it sufficiently novel and complex to stump contemporary AI tools. The irony isn't lost that AI-assisted cheating, already causing disruption at schools and universities worldwide, now affects the very AI labs creating these powerful models. Yet Anthropic's unique position as both the problem's source and victim gives it distinct advantages in combating the issue
1
2
.As part of the blog post, Hume shared the original test publicly, inviting readers to propose better solutions or demonstrate their abilities. "If you can best Opus 4.5, we'd love to hear from you," the post reads, turning the challenge into both a recruitment opportunity and a crowdsourced problem-solving exercise
1
2
.Related Stories
This situation raises critical questions about the future of remote technical assessments across the tech industry. As AI coding tools continue advancing, companies face mounting pressure to rethink how they identify genuine talent. The short-term solution may involve more creative, novel problems that current models struggle with, but the long-term trajectory suggests a fundamental shift away from traditional take-home tests toward formats that better authenticate human work. Organizations should watch how leading AI labs adapt their recruitment strategies, as these approaches will likely influence hiring practices across the broader technology sector.
Summarized by
Navi
[1]
[2]
1
Policy and Regulation

2
Technology

3
Technology
