Microsoft Copilot integrates rival AI models to fact-check research and reduce hallucinations

Reviewed byNidhi Govil

9 Sources

Share

Microsoft unveiled major AI upgrades to its Copilot research assistant, introducing a multi-model approach where OpenAI's GPT and Anthropic's Claude work together. The new Critique feature has GPT draft responses while Claude reviews for accuracy, boosting performance by 13.8% on industry benchmarks and addressing persistent AI hallucinations that plague single-model systems.

News article

Microsoft Copilot Deploys Multi-Model Approach to Improve Research Accuracy

Microsoft announced significant AI upgrades to its Copilot research assistant on March 30, introducing a collaborative framework where multiple AI models work together within the same workflow

1

. The centerpiece is a new Critique feature that integrates rival AI models from OpenAI and Anthropic to deliver more reliable outputs. Under this system, GPT generates initial responses while Claude reviews the output for accuracy, completeness, and citation quality before presenting it to users

3

. This represents a strategic shift from single-model reliance to orchestrated collaboration between competing systems.

GPT and Claude Collaboration Tackles AI Hallucinations

The Critique feature addresses one of the most persistent challenges in enterprise AI: hallucinations where systems generate false information. Nicole Herskowitz, corporate vice president of Microsoft 365 and Copilot, explained that having various different models from different vendors working together takes the technology "to the next level, where customers actually get the benefits of the models working together"

1

. The multi-model approach creates what Microsoft describes as "a powerful feedback loop that delivers higher-quality results across factual accuracy, analytical breadth, and presentation"

3

. Microsoft expects this workflow to eventually become bi-directional, allowing GPT to review Claude's drafts as well

1

.

Researcher Agent Scores 13.8% Higher on DRACO Benchmark

The impact of these AI upgrades is measurable. The Researcher agent with Critique enabled now scores 13.8% higher on the DRACO benchmark—the industry standard for Deep Research Accuracy, Completeness, and Objectivity

2

. Achieving 57.4% on this benchmark, the system outperforms standalone deep-research tools from OpenAI, Google, Perplexity, and Anthropic

4

. This makes it more than twice as reliable as Deep Research with OpenAI's o4-mini model and superior to Gemini Deep Research and Claude Opus 4.6

5

.

Model Council Enables Side-by-Side Comparison

Beyond Critique, Microsoft introduced Model Council, a feature that allows users to compare responses from different AI models side-by-side

1

. This tool hands prompts to several models and displays what each one says, highlighting where they agree or disagree

2

. The approach gives users more autonomy in workflow management and allows them to assemble the best possible response by drawing from multiple sources of intelligence.

Copilot Cowork Expands to Frontier Program

These updates arrive as Microsoft makes Copilot Cowork more widely available to members in its Frontier program, which provides customers with early access to cutting-edge AI features

1

. Copilot Cowork, built on Anthropic's viral Claude Cowork product, represents a move toward agentic AI that goes beyond simple chatbot interactions to become a digital assistant capable of handling long-running, multi-step tasks

2

. Microsoft unveiled Copilot Cowork in testing mode earlier this month, capitalizing on growing demand for autonomous AI agents

1

.

Driving Adoption Amid Intense Enterprise AI Competition

The timing reflects Microsoft's urgency to accelerate adoption among its existing business customer base. The company reported 15 million paid Copilot seats in January—roughly 3.3% of its 450 million commercial Microsoft 365 users

4

. This low penetration rate comes amid intensifying competition in enterprise AI, with Google expanding Gemini across Workspace apps and Anthropic's Claude gaining traction among businesses

4

. Jared Spataro, Microsoft AI at Work Chief Marketing Officer, framed the shift: "When intelligence and trust move together, AI stops being an experiment and starts becoming how work gets done"

5

.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo