4 Sources
4 Sources
[1]
Microsoft's New On-Device AI Model Can Control Your PC
In a potential preview of the future, Microsoft's newest AI model can not only run natively on your PC, but is smart enough to complete tasks for you, like buying products online. On Monday, the company released the experimental Fara-7B AI model, describing it as Microsoft's first "agentic" small language model "designed specifically for computer use," including controlling the mouse and keyboard. Fara-7B spans 7 billion parameters, making it significantly smaller than OpenAI's GPT-3 model from 2020, which featured 175 billion parameters. (We don't know the exact parameters of OpenAI's more recent models.) But despite its relatively small size, Microsoft's experimental model "achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems that depend on prompting multiple large models," the company said. This includes outperforming OpenAI's GPT-4o when it was configured for online browsing. Microsoft's new AI model works "by visually perceiving a web page," enabling it to understand and take actions over a PC's desktop. "It does not rely on separate models to parse the screen, nor on any additional information like accessibility trees, and thus uses the same modalities as humans to interact with the computer," Microsoft added. In a blog post, Microsoft demoed the Fara-7B model by publishing three videos that show it buying a product online, searching for information and summarizing the result, and using online maps to find the distance between two locations. Fara-7B seems to respond and act slower than a human while requiring the user to approve certain steps, such as entering account login information. Still, the demos offer a glimpse at how future AI models could automate and execute many day-to-day tasks for users as the technologies become smarter. Microsoft's Copilot assistant for Windows 11 can also act as an "agent" and execute tasks on behalf of the user. The key difference is that Copilot requires an internet connection to the company's power-hungry data centers. It also needs to collect data from your PC, which can raise privacy concerns, although Microsoft has policies in place to protect user data. In contrast, Fara-7B can run natively. "This results in reduced latency and improved privacy, as user data remains local," according to Microsoft. That said, Fara-7B is far from flawless. Microsoft's own testing found occasional errors with "accuracy on more complex tasks, mistakes in following instructions, and susceptibility to hallucinations." It's why Microsoft is advising interested users to only test the AI model "in a sandboxed environment, monitoring its execution, and avoiding sensitive data or high-risk domains." The company has also built safeguards into Fara-7B, which will cause it to refuse to execute malicious tasks. Fara-7B builds off Microsoft's previous effort to create on-device AI programs. Last year, the company released Phi-3, a chatbot-related AI model small enough that it can be stored on a smartphone. In the case of Fara-7B, the company is releasing the AI model as a 16.6GB file, which is meant to be used with Magnetic-UI, Microsoft's AI research testing platform. In addition, the company plans on releasing a Fara-7B model that can run on Windows 11 Copilot+ PCs, which feature dedicated AI processing.
[2]
Microsoft's Fara-7B brings AI agents to the PC with on-device automation
The model can interpret on-screen visuals, automate tasks directly on the device, and give enterprises a more affordable alternative to cloud-dependent AI agents. Microsoft is pushing agentic AI deeper into the PC with Fara-7B, a compact computer-use agent (CUA) model that can automate complex tasks entirely on a local device. The experimental release, aimed at gathering feedback, provides enterprises with a preview of how AI agents might run sensitive workflows without sending data to the cloud, while still matching or outperforming larger models like GPT-4o in real UI navigation tasks. "Unlike traditional chat models that generate text-based responses, Computer Use Agent (CUA) models like Fara-7B leverage computer interfaces, such as a mouse and keyboard, to complete tasks on behalf of users," Microsoft said in a blog post. "With only 7 billion parameters, Fara-7B achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems that depend on prompting multiple large models."
[3]
Microsoft's Fara-7B is a computer-use AI agent that rivals GPT-4o and works directly on your PC
Microsoft has introduced Fara-7B, a new 7-billion parameter model designed to act as a Computer Use Agent (CUA) capable of performing complex tasks directly on a user's device. Fara-7B sets new state-of-the-art results for its size, providing a way to build AI agents that don't rely on massive, cloud-dependent models and can run on compact systems with lower latency and enhanced privacy. While the model is an experimental release, its architecture addresses a primary barrier to enterprise adoption: data security. Because Fara-7B is small enough to run locally, it allows users to automate sensitive workflows, such as managing internal accounts or processing sensitive company data, without that information ever leaving the device. How Fara-7B sees the web Fara-7B is designed to navigate user interfaces using the same tools a human does: a mouse and keyboard. The model operates by visually perceiving a web page through screenshots and predicting specific coordinates for actions like clicking, typing, and scrolling. Crucially, Fara-7B does not rely on "accessibility trees," the underlying code structure that browsers use to describe web pages to screen readers. Instead, it relies solely on pixel-level visual data. This approach allows the agent to interact with websites even when the underlying code is obfuscated or complex. According to Yash Lara, Senior PM Lead at Microsoft Research, processing all visual input on-device creates true "pixel sovereignty," since screenshots and the reasoning needed for automation remain on the user's device. "This approach helps organizations meet strict requirements in regulated sectors, including HIPAA and GLBA," he told VentureBeat in written comments. In benchmarking tests, this visual-first approach has yielded strong results. On WebVoyager, a standard benchmark for web agents, Fara-7B achieved a task success rate of 73.5%. This outperforms larger, more resource-intensive systems, including GPT-4o, when prompted to act as a computer use agent (65.1%) and the native UI-TARS-1.5-7B model (66.4%). Efficiency is another key differentiator. In comparative tests, Fara-7B completed tasks in approximately 16 steps on average, compared to roughly 41 steps for the UI-TARS-1.5-7B model. Handling risks The transition to autonomous agents is not without risks, however. Microsoft notes that Fara-7B shares limitations common to other AI models, including potential hallucinations, mistakes in following complex instructions, and accuracy degradation on intricate tasks. To mitigate these risks, the model was trained to recognize "Critical Points." A Critical Point is defined as any situation requiring a user's personal data or consent before an irreversible action occurs, such as sending an email or completing a financial transaction. Upon reaching such a juncture, Fara-7B is designed to pause and explicitly request user approval before proceeding. Managing this interaction without frustrating the user is a key design challenge. "Balancing robust safeguards such as Critical Points with seamless user journeys is key," Lara said. "Having a UI, like Microsoft Research's Magentic-UI, is vital for giving users opportunities to intervene when necessary, while also helping to avoid approval fatigue." Magentic-UI is a research prototype designed specifically to facilitate these human-agent interactions. Fara-7B is designed to run in Magentic-UI. Distilling complexity into a single model The development of Fara-7B highlights a growing trend in knowledge distillation, where the capabilities of a complex system are compressed into a smaller, more efficient model. Creating a CUA usually requires massive amounts of training data showing how to navigate the web. Collecting this data via human annotation is prohibitively expensive. To solve this, Microsoft used a synthetic data pipeline built on Magentic-One, a multi-agent framework. In this setup, an "Orchestrator" agent created plans and directed a "WebSurfer" agent to browse the web, generating 145,000 successful task trajectories. The researchers then "distilled" this complex interaction data into Fara-7B, which is built on Qwen2.5-VL-7B, a base model chosen for its long context window (up to 128,000 tokens) and its strong ability to connect text instructions to visual elements on a screen. While the data generation required a heavy multi-agent system, Fara-7B itself is a single model, showing that a small model can effectively learn advanced behaviors without needing complex scaffolding at runtime. The training process relied on supervised fine-tuning, where the model learns by mimicking the successful examples generated by the synthetic pipeline. Looking forward While the current version was trained on static datasets, future iterations will focus on making the model smarter, not necessarily bigger. "Moving forward, we'll strive to maintain the small size of our models," Lara said. "Our ongoing research is focused on making agentic models smarter and safer, not just larger." This includes exploring techniques like reinforcement learning (RL) in live, sandboxed environments, which would allow the model to learn from trial and error in real-time. Microsoft has made the model available on Hugging Face and Microsoft Foundry under an MIT license. However, Lara cautions that while the license allows for commercial use, the model is not yet production-ready. "You can freely experiment and prototype with Fara‑7B under the MIT license," he says, "but it's best suited for pilots and proofs‑of‑concept rather than mission‑critical deployments."
[4]
Microsoft Unveils Fara-7B Agentic Model Built on Qwen for Computer Use | AIM
The model is trained on 145,000 synthetic trajectories generated through the Magentic-One framework. Microsoft has launched Fara-7B, its first small language model built to operate a computer the way a person does. The company claims the 7-billion-parameter model matches or beats larger agentic systems on live web tasks while running locally with lower latency and stronger privacy. Fara-7B reads a webpage visually and completes tasks by clicking, typing and scrolling on predicted coordinates. It does not rely on accessibility trees or separate parsing layers. Microsoft says the model finishes tasks in about 16 steps on average, which is far fewer than many comparable systems. The model is trained on 145,000 synthetic trajectories generated through the Magentic-One framework and is built on Qwen2.5-VL-7B with supervised fine-tuning. The company positions Fara-7B as an everyday computer-use agent that can search, summarise, fill forms, manage accounts, book tickets, shop online, compare prices and find jobs or real estate listings. Microsoft is also releasing WebTailBench, a new test set with 609 real-world tasks across 11 categories. Fara-7B leads all computer-use models across every segment, including shopping, flights, hotels, restaurants and multi-step comparison tasks. The company offers two ways to run the model. Azure Foundry hosting lets users deploy Fara-7B without downloading weights or using their own GPUs. Advanced users can self-host through VLLM on GPU hardware. The evaluation stack relies on Playwright and an abstract agent interface that can plug in any model. Microsoft warns that Fara-7B is an experimental release and should be run in sandboxed settings without sensitive data. Earlier this year, Microsoft launched Phi-4-multimodal and Phi-4-mini, the latest additions to its Phi family of small language models (SLMs). Last month, Google DeepMind released the Gemini 2.5 Computer Use model, a specialised version of its Gemini 2.5 Pro AI that can interact with user interfaces. The model is available in preview via the Gemini API through Google AI Studio and Vertex AI Studio.
Share
Share
Copy Link
Microsoft introduces Fara-7B, a 7-billion parameter AI model that can autonomously control computers through visual perception, offering local processing for enhanced privacy and competitive performance against larger cloud-based systems.
Microsoft has unveiled Fara-7B, marking a significant milestone as the company's first "agentic" small language model specifically designed for computer control
1
. The 7-billion parameter model represents a breakthrough in on-device AI capabilities, enabling autonomous computer operation through visual perception and direct hardware interaction.
Source: AIM
Unlike traditional AI assistants that require cloud connectivity, Fara-7B operates entirely on local devices, addressing critical privacy and latency concerns that have hindered enterprise adoption of AI agents
2
. The model can perform complex tasks including online shopping, information searches, form filling, and account management without transmitting sensitive data to external servers.Fara-7B operates through a sophisticated visual-first approach, interpreting web pages and desktop interfaces through screenshot analysis rather than relying on accessibility trees or underlying code structures
3
. This pixel-level visual processing enables the model to interact with any interface, even when code is obfuscated or complex.
Source: VentureBeat
Built on the Qwen2.5-VL-7B foundation model, Fara-7B was trained using 145,000 synthetic trajectories generated through Microsoft's Magentic-One framework
4
. The training process involved an "Orchestrator" agent creating plans and directing a "WebSurfer" agent to browse the web, with successful interactions then distilled into the compact model.Benchmark results demonstrate Fara-7B's exceptional efficiency, achieving a 73.5% task success rate on WebVoyager, outperforming GPT-4o's 65.1% when configured for computer use
3
. The model completes tasks in approximately 16 steps on average, significantly fewer than comparable systems like UI-TARS-1.5-7B, which requires roughly 41 steps.Recognizing the potential risks of autonomous computer control, Microsoft has implemented comprehensive safety measures within Fara-7B. The model incorporates "Critical Points" detection, automatically pausing execution when encountering situations requiring personal data input or user consent before irreversible actions
3
.Microsoft acknowledges that Fara-7B shares common AI limitations, including potential hallucinations, instruction-following errors, and accuracy degradation on complex tasks
1
. The company strongly recommends testing the experimental model only in sandboxed environments while avoiding sensitive data or high-risk domains.Related Stories
The model addresses a primary barrier to enterprise AI adoption by enabling sensitive workflow automation without cloud dependency
2
. Organizations in regulated sectors, including those subject to HIPAA and GLBA requirements, can leverage Fara-7B's "pixel sovereignty" approach to maintain data compliance while automating routine tasks.
Source: InfoWorld
Microsoft has also released WebTailBench, a comprehensive test set featuring 609 real-world tasks across 11 categories, where Fara-7B demonstrates leadership across all segments including shopping, travel booking, and multi-step comparison tasks
4
.Summarized by
Navi
[1]
[3]
06 Nov 2024•Technology

16 Apr 2025•Technology

29 Aug 2025•Technology

1
Technology

2
Technology

3
Technology
