3 Sources
3 Sources
[1]
Microsoft's New On-Device AI Model Can Control Your PC
In a potential preview of the future, Microsoft's newest AI model can not only run natively on your PC, but is smart enough to complete tasks for you, like buying products online. On Monday, the company released the experimental Fara-7B AI model, describing it as Microsoft's first "agentic" small language model "designed specifically for computer use," including controlling the mouse and keyboard. Fara-7B spans 7 billion parameters, making it significantly smaller than OpenAI's GPT-3 model from 2020, which featured 175 billion parameters. (We don't know the exact parameters of OpenAI's more recent models.) But despite its relatively small size, Microsoft's experimental model "achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems that depend on prompting multiple large models," the company said. This includes outperforming OpenAI's GPT-4o when it was configured for online browsing. Microsoft's new AI model works "by visually perceiving a web page," enabling it to understand and take actions over a PC's desktop. "It does not rely on separate models to parse the screen, nor on any additional information like accessibility trees, and thus uses the same modalities as humans to interact with the computer," Microsoft added. In a blog post, Microsoft demoed the Fara-7B model by publishing three videos that show it buying a product online, searching for information and summarizing the result, and using online maps to find the distance between two locations. Fara-7B seems to respond and act slower than a human while requiring the user to approve certain steps, such as entering account login information. Still, the demos offer a glimpse at how future AI models could automate and execute many day-to-day tasks for users as the technologies become smarter. Microsoft's Copilot assistant for Windows 11 can also act as an "agent" and execute tasks on behalf of the user. The key difference is that Copilot requires an internet connection to the company's power-hungry data centers. It also needs to collect data from your PC, which can raise privacy concerns, although Microsoft has policies in place to protect user data. In contrast, Fara-7B can run natively. "This results in reduced latency and improved privacy, as user data remains local," according to Microsoft. That said, Fara-7B is far from flawless. Microsoft's own testing found occasional errors with "accuracy on more complex tasks, mistakes in following instructions, and susceptibility to hallucinations." It's why Microsoft is advising interested users to only test the AI model "in a sandboxed environment, monitoring its execution, and avoiding sensitive data or high-risk domains." The company has also built safeguards into Fara-7B, which will cause it to refuse to execute malicious tasks. Fara-7B builds off Microsoft's previous effort to create on-device AI programs. Last year, the company released Phi-3, a chatbot-related AI model small enough that it can be stored on a smartphone. In the case of Fara-7B, the company is releasing the AI model as a 16.6GB file, which is meant to be used with Magnetic-UI, Microsoft's AI research testing platform. In addition, the company plans on releasing a Fara-7B model that can run on Windows 11 Copilot+ PCs, which feature dedicated AI processing.
[2]
Microsoft's Fara-7B brings AI agents to the PC with on-device automation
The model can interpret on-screen visuals, automate tasks directly on the device, and give enterprises a more affordable alternative to cloud-dependent AI agents. Microsoft is pushing agentic AI deeper into the PC with Fara-7B, a compact computer-use agent (CUA) model that can automate complex tasks entirely on a local device. The experimental release, aimed at gathering feedback, provides enterprises with a preview of how AI agents might run sensitive workflows without sending data to the cloud, while still matching or outperforming larger models like GPT-4o in real UI navigation tasks. "Unlike traditional chat models that generate text-based responses, Computer Use Agent (CUA) models like Fara-7B leverage computer interfaces, such as a mouse and keyboard, to complete tasks on behalf of users," Microsoft said in a blog post. "With only 7 billion parameters, Fara-7B achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems that depend on prompting multiple large models."
[3]
Microsoft's Fara-7B is a computer-use AI agent that rivals GPT-4o and works directly on your PC
Microsoft has introduced Fara-7B, a new 7-billion parameter model designed to act as a Computer Use Agent (CUA) capable of performing complex tasks directly on a user's device. Fara-7B sets new state-of-the-art results for its size, providing a way to build AI agents that don't rely on massive, cloud-dependent models and can run on compact systems with lower latency and enhanced privacy. While the model is an experimental release, its architecture addresses a primary barrier to enterprise adoption: data security. Because Fara-7B is small enough to run locally, it allows users to automate sensitive workflows, such as managing internal accounts or processing sensitive company data, without that information ever leaving the device. How Fara-7B sees the web Fara-7B is designed to navigate user interfaces using the same tools a human does: a mouse and keyboard. The model operates by visually perceiving a web page through screenshots and predicting specific coordinates for actions like clicking, typing, and scrolling. Crucially, Fara-7B does not rely on "accessibility trees," the underlying code structure that browsers use to describe web pages to screen readers. Instead, it relies solely on pixel-level visual data. This approach allows the agent to interact with websites even when the underlying code is obfuscated or complex. According to Yash Lara, Senior PM Lead at Microsoft Research, processing all visual input on-device creates true "pixel sovereignty," since screenshots and the reasoning needed for automation remain on the user's device. "This approach helps organizations meet strict requirements in regulated sectors, including HIPAA and GLBA," he told VentureBeat in written comments. In benchmarking tests, this visual-first approach has yielded strong results. On WebVoyager, a standard benchmark for web agents, Fara-7B achieved a task success rate of 73.5%. This outperforms larger, more resource-intensive systems, including GPT-4o, when prompted to act as a computer use agent (65.1%) and the native UI-TARS-1.5-7B model (66.4%). Efficiency is another key differentiator. In comparative tests, Fara-7B completed tasks in approximately 16 steps on average, compared to roughly 41 steps for the UI-TARS-1.5-7B model. Handling risks The transition to autonomous agents is not without risks, however. Microsoft notes that Fara-7B shares limitations common to other AI models, including potential hallucinations, mistakes in following complex instructions, and accuracy degradation on intricate tasks. To mitigate these risks, the model was trained to recognize "Critical Points." A Critical Point is defined as any situation requiring a user's personal data or consent before an irreversible action occurs, such as sending an email or completing a financial transaction. Upon reaching such a juncture, Fara-7B is designed to pause and explicitly request user approval before proceeding. Managing this interaction without frustrating the user is a key design challenge. "Balancing robust safeguards such as Critical Points with seamless user journeys is key," Lara said. "Having a UI, like Microsoft Research's Magentic-UI, is vital for giving users opportunities to intervene when necessary, while also helping to avoid approval fatigue." Magentic-UI is a research prototype designed specifically to facilitate these human-agent interactions. Fara-7B is designed to run in Magentic-UI. Distilling complexity into a single model The development of Fara-7B highlights a growing trend in knowledge distillation, where the capabilities of a complex system are compressed into a smaller, more efficient model. Creating a CUA usually requires massive amounts of training data showing how to navigate the web. Collecting this data via human annotation is prohibitively expensive. To solve this, Microsoft used a synthetic data pipeline built on Magentic-One, a multi-agent framework. In this setup, an "Orchestrator" agent created plans and directed a "WebSurfer" agent to browse the web, generating 145,000 successful task trajectories. The researchers then "distilled" this complex interaction data into Fara-7B, which is built on Qwen2.5-VL-7B, a base model chosen for its long context window (up to 128,000 tokens) and its strong ability to connect text instructions to visual elements on a screen. While the data generation required a heavy multi-agent system, Fara-7B itself is a single model, showing that a small model can effectively learn advanced behaviors without needing complex scaffolding at runtime. The training process relied on supervised fine-tuning, where the model learns by mimicking the successful examples generated by the synthetic pipeline. Looking forward While the current version was trained on static datasets, future iterations will focus on making the model smarter, not necessarily bigger. "Moving forward, we'll strive to maintain the small size of our models," Lara said. "Our ongoing research is focused on making agentic models smarter and safer, not just larger." This includes exploring techniques like reinforcement learning (RL) in live, sandboxed environments, which would allow the model to learn from trial and error in real-time. Microsoft has made the model available on Hugging Face and Microsoft Foundry under an MIT license. However, Lara cautions that while the license allows for commercial use, the model is not yet production-ready. "You can freely experiment and prototype with Fara‑7B under the MIT license," he says, "but it's best suited for pilots and proofs‑of‑concept rather than mission‑critical deployments."
Share
Share
Copy Link
Microsoft releases Fara-7B, an experimental 7-billion parameter AI model that can autonomously control computers through visual perception and mouse/keyboard interactions. The compact model runs entirely on-device, offering enhanced privacy and reduced latency while outperforming larger cloud-based systems like GPT-4o in web navigation tasks.

Microsoft has unveiled Fara-7B, an experimental artificial intelligence model that represents a significant leap forward in autonomous computer control. The 7-billion parameter model is designed as Microsoft's first "agentic" small language model specifically engineered for computer use, capable of controlling mouse and keyboard functions entirely on local devices
1
.Unlike traditional AI assistants that require cloud connectivity, Fara-7B operates as a Computer Use Agent (CUA) that can automate complex tasks directly on users' devices. The model works by visually perceiving web pages and desktop environments, understanding and executing actions using the same modalities as humans to interact with computers
2
.Despite being significantly smaller than OpenAI's GPT-3 model from 2020, which featured 175 billion parameters, Fara-7B achieves remarkable performance metrics. In benchmarking tests on WebVoyager, a standard benchmark for web agents, the model achieved a 73.5% task success rate, outperforming larger systems including GPT-4o when configured for computer use (65.1%) and the native UI-TARS-1.5-7B model (66.4%)
3
.The efficiency advantages extend beyond success rates. Fara-7B completes tasks in approximately 16 steps on average, compared to roughly 41 steps required by competing models, demonstrating superior task optimization and execution speed
3
.Fara-7B's architecture relies on a pixel-level visual approach, processing screenshots to predict specific coordinates for actions like clicking, typing, and scrolling. Crucially, the model does not depend on accessibility trees or underlying code structures, allowing it to interact with websites even when the underlying code is obfuscated or complex
3
.The model is built on Qwen2.5-VL-7B as its base, chosen for its long context window of up to 128,000 tokens and strong ability to connect text instructions to visual elements on screen. Microsoft developed Fara-7B through knowledge distillation, using synthetic data generated by their Magentic-One multi-agent framework, which created 145,000 successful task trajectories
3
.The on-device operation of Fara-7B addresses critical enterprise concerns about data security and privacy. According to Yash Lara, Senior PM Lead at Microsoft Research, processing all visual input on-device creates "pixel sovereignty," ensuring that screenshots and automation reasoning remain on the user's device. This approach helps organizations meet strict regulatory requirements, including HIPAA and GLBA compliance
3
.The local processing capability results in reduced latency and improved privacy compared to cloud-dependent alternatives like Microsoft's own Copilot assistant, which requires internet connectivity and data collection from users' PCs
1
.Related Stories
Recognizing the potential risks of autonomous AI agents, Microsoft has implemented several safety measures in Fara-7B. The model is trained to recognize "Critical Points" - situations requiring personal data or consent before irreversible actions occur, such as sending emails or completing financial transactions. Upon reaching these junctures, Fara-7B pauses and explicitly requests user approval before proceeding
3
.Microsoft acknowledges that Fara-7B shares common AI limitations, including potential hallucinations, mistakes in following complex instructions, and accuracy degradation on intricate tasks. The company advises users to test the model only in sandboxed environments while monitoring its execution and avoiding sensitive data or high-risk domains
1
.Microsoft is releasing Fara-7B as a 16.6GB file designed to work with Magnetic-UI, the company's AI research testing platform. The company also plans to release a version optimized for Windows 11 Copilot+ PCs, which feature dedicated AI processing capabilities
1
.The experimental release aims to gather feedback from researchers and developers, with Microsoft focusing future development on making models smarter rather than larger, maintaining the compact size advantage while enhancing capabilities
3
.Summarized by
Navi
[1]
16 Apr 2025•Technology

17 Jun 2025•Technology

19 Nov 2024•Technology

1
Technology

2
Technology

3
Business and Economy
