5 Sources
[1]
With Perplexity's Push for Hybrid AI, Your Laptop Could Function as a Data Center
With more than a decade of experience, Nelson covers Apple and Google and writes about iPhone and Android features, privacy and security settings, and more. Perplexity, an AI-powered search and answer engine, has a new way to turn personal devices into decentralized data centers. The company said Tuesday that it's adding a new hybrid local-server system to Personal Computer, its AI agent that can work across files, apps and the web. Starting in July, the system will automatically decide which parts of a task should run directly on a user's device and which should be sent to more powerful AI models in the cloud. A smaller model running locally could handle sensitive data and routine work locally, such as financial records, health information and personal files. More complicated work that requires the capabilities of a larger AI model could still be sent to a server. Perplexity says its system will make that decision automatically, breaking a larger task into smaller parts and routing each one to the appropriate place. Users won't need to choose between a local model and a cloud-based model before getting started. Personal Computer is currently available through Perplexity's Mac app. It expands the company's existing Computer agent with features including local file editing, computer use and browsing through Perplexity's Comet browser. Perplexity also said that Personal Computer is coming to Windows. Although the current app is available on Mac, Perplexity is pitching the underlying technology as a broader system that can work across different types of hardware. The company said it unveiled the system with Intel and that the same framework runs on other local silicon, including Nvidia's RTX Spark platform. Moving more work onto users' devices could also reduce the amount of expensive cloud computing required to complete AI tasks. Perplexity argues that routine work shouldn't consume the same data center resources as a request that genuinely needs one of the most capable AI models.
[2]
Perplexity splits AI inference between PCs and cloud to cut costs
Perplexity AI announced a platform at Computex that dynamically routes AI inference between PCs and cloud servers in real time, acting as an "air-traffic controller" for AI tasks. The chip-agnostic system targets the cost crisis of centralised inference as Perplexity's revenue hits $500 million. Perplexity AI has developed a platform that dynamically splits AI workloads between personal computers and cloud servers, deciding in real time which tasks can run locally on a PC's processor and which need the power of data centre hardware. CEO Aravind Srinivas announced the system at Computex in Taipei on Tuesday, describing it as an "air-traffic controller for AI tasks" designed to reduce the cost of inference, the process of running trained AI models to generate responses. "You don't want all your compute centralised in servers and everything running through the largest models," Srinivas said in a Bloomberg Television interview. "You're already reading reports of how people are freaking out about their cost. Some people are spending half a billion dollars per month. What you actually want is efficient value per watt per user." How it works The system evaluates each AI task and routes it to the most efficient compute layer. Simple operations that modern PC processors can handle, such as summarisation, formatting, or lightweight classification, run locally without touching the cloud. More complex tasks that require large model inference, such as multi-step reasoning or retrieval-augmented generation across large datasets, get routed to cloud servers. The routing decision happens in real time, invisible to the user. The practical effect is that Perplexity can serve more users at lower cost by offloading a portion of inference work to the billions of PCs already in circulation. As AI inference demand strains data centre capacity and drives utilities to plan $1.4 trillion in grid upgrades, distributing compute to the edge is both an economic and infrastructure necessity. Srinivas made the announcement alongside Intel CEO Lip-Bu Tan, whose company leads the market for PC processors and has a commercial interest in making PCs a meaningful AI compute layer. However, Srinivas said the platform is "chip agnostic" and works with Nvidia processors as well. Nvidia highlighted the same edge-inference trend at Computex with its new RTX Spark platform for AI-powered laptops and desktops. The cost problem Srinivas's reference to companies "spending half a billion dollars per month" on AI compute is not hyperbole. OpenAI's infrastructure costs have been widely reported at that scale, and Anthropic's projected $10.9 billion in Q2 revenue comes with substantial compute expenses that compress margins. The energy and cost burden of centralised AI inference is one of the defining constraints of the current AI boom. Perplexity's approach inverts the assumption that AI inference must happen in the cloud. By treating the PC as a first-class compute node rather than a thin client, the company can reduce its own server costs while potentially delivering faster responses for tasks that run locally. The tradeoff is complexity: the routing system must accurately assess task difficulty in milliseconds, and the quality of local inference depends on the user's hardware capabilities. Revenue efficiency Perplexity's financial trajectory underscores why cost efficiency matters. Srinivas posted on X in April that the company's revenue grew fivefold, from $100 million to $500 million, while headcount increased just 34%. That ratio, roughly 15x revenue growth per employee added, reflects both the leverage of AI-native business models and Perplexity's position as an aggregator that routes queries across multiple AI providers rather than training its own frontier models. "Every time any of the AI gets better, our unified system also gets better because we route across all of them," Srinivas said. The AI-native growth rates that are drawing capital away from traditional SaaS companies are partly enabled by this kind of architectural efficiency, where the product improves as its underlying providers improve, without proportional cost increases. The hybrid compute platform extends that logic to hardware. If Perplexity can use the compute already sitting on users' desks to handle a meaningful share of inference work, it reduces marginal cost per query and improves response latency for lightweight tasks. As AI moves deeper into enterprise workflows, the economics of who pays for the compute, the cloud provider, the AI company, or the user's own hardware, will become a critical competitive variable.
[3]
Perplexity Computer adding ability to split tasks between local and cloud models
Perplexity has announced a major new feature coming soon to Perplexity Computer: the ability to split tasks between local and cloud models. Perplexity Computer is the company's agentic system for putting AI to work for you. The upcoming task splitting feature will let Perplexity Computer switch between on-device AI models and more powerful server-based models for different jobs. The idea is that one system will be able to handle sensitive data locally while also accessing front models with more power when needed. Here's Perplexity on why it's building the new hybrid AI feature: Hybrid agentic inference is for work that includes sensitive data but needs powerful AI. Things like financial records, health information, and personal files. The compact model runs locally on your device to determine when sensitive data should also be kept locally. Meanwhile, work that needs a frontier model's full capability runs on the server. Most real tasks are a mix, so Personal Computer splits them and coordinates the parts. Unlike tools that ask you to pick local or cloud up front, this happens on its own, task by task. Perplexity says the hybrid AI orchestration feature will arrive for Perplexity Computer in July.
[4]
Perplexity AI unveils hybrid local-cloud inference system at Computex 2026
Perplexity AI, the fast-growing search startup now valued at $20 billion, unveiled what it calls the first hybrid local-server inference orchestrator at Computex 2026 on Monday night, demonstrating software that autonomously decides -- in real time and mid-task -- which AI workloads stay on a user's device and which get routed to frontier models in the cloud. CEO Aravind Srinivas demonstrated the system onstage alongside Intel CEO Lip-Bu Tan during Intel's keynote address, using Perplexity's "Personal Computer" agent to process confidential deal materials. In the demonstration, local models running on Intel Core Ultra Series 3 determined which information should remain on the device and which information could be sent to cloud-based models. Srinivas said the approach balances intelligence, accuracy, privacy, and cost. The key claim is not that a model can run locally -- dozens of tools already do that. It is that Perplexity's system makes the routing decision itself, task by task, without requiring the user to choose in advance. Sensitive data like financial records or health information stays on the local machine; the heavier reasoning tasks that require frontier-scale models get sent to the cloud. One task, multiple execution locations, automatic orchestration. "No product has done this before," a Perplexity spokesperson said in an email to VentureBeat. The product is not yet available to users; according to the company, the hybrid inference feature will launch in the coming weeks. Perplexity's road from cloud-only agents to on-device AI orchestration To understand why the Computex demonstration matters, it helps to trace the product arc Perplexity has been building since early this year. On February 25, Perplexity launched Computer, a multi-model AI agent that orchestrates 19 different AI models to complete complex, long-running tasks on behalf of users. The system ran entirely in the cloud, breaking goals into subtasks and routing each to whichever model -- Claude, Gemini, GPT, Grok, or others -- was best suited for the job. Perplexity Computer unified every current AI capability into a single system, functioning as a general-purpose digital worker that operates the same interfaces a user does. Then, in March, Perplexity introduced Personal Computer at its inaugural Ask 2026 developer conference. That product launched as a new Mac app with support for a hybrid local-cloud AI agent, which Perplexity described as a "personal orchestrator" that hybridizes local and server environments for security and productivity. Personal Computer could access the Mac's file system and native Mac apps to create and execute entire workflows, with files created in a secure sandbox and all actions auditable and reversible. What Srinivas demonstrated at Computex extends this architecture in a fundamental way. Previously, even the Personal Computer product divided labor along relatively clear lines: local file access on the device, heavy computation on Perplexity's servers. The new hybrid inference orchestrator gives the system itself the ability to reason about where each piece of a task should execute -- not just which model to use, but which physical location should process it. The system reportedly asks for user permission before sending sensitive tasks to the cloud, a design choice that addresses one of the central anxieties enterprises have about agentic AI: data governance. Why Nvidia's RTX Spark and Intel's new silicon make the timing strategic The timing of the demonstration is not coincidental. Computex 2026 has been dominated by a single theme: on-device AI. Just hours before the Intel keynote, Nvidia CEO Jensen Huang unveiled the RTX Spark, a new Arm-based superchip that the company positions as the foundation for a new generation of AI-native Windows PCs. At full strength, the RTX Spark Superchip offers up to 20 Arm CPU cores, a Blackwell GPU with 6,144 CUDA cores, 128GB of LPDDR5X RAM, and up to 300 GB/s of memory bandwidth -- enough power and memory for AI agents and 120-billion-parameter models with context lengths stretching to a million tokens. RTX Spark systems will begin arriving in the fall. Intel, not to be outdone, used its keynote to showcase Xeon 6+ processors with 288 efficiency cores built on 18A technology for the data center, and positioned its Core Ultra Series 3 as the client silicon that makes hybrid inference possible on the PC. Perplexity's hybrid orchestrator sits at the intersection of both strategies. If the system performs as advertised, it creates a direct economic incentive for users -- and eventually enterprises -- to invest in more powerful local silicon. The more capable the on-device chip, the more inference can run locally, reducing cloud costs and improving latency for sensitive workloads. That dynamic benefits Nvidia, Intel, and every other chipmaker competing for AI PC sockets. The implications extend well beyond chip economics. "As chips become more powerful, more intelligence moves onto a person's machine, alongside server inference for the complex tasks that still need frontier models," a Perplexity spokesperson told VentureBeat. "Sensitive and sovereign work can stay local, which changes the need for massive country-level infrastructure." That last claim -- about sovereign infrastructure -- is the most provocative. Nations from the UAE to France to India have been investing billions in domestic AI compute capacity partly on the assumption that sensitive data must stay within their borders, which means building or buying access to local data centers. If meaningful inference can run on an end user's device with no data leaving the machine, the calculus changes. It does not eliminate the need for data centers, but it could soften the urgency of the buildout. The model-agnostic architecture that makes hybrid inference possible Perplexity's hybrid inference play rests on the same architectural bet the company has been making all year: that the orchestration layer matters more than any individual model. For AI engineers, this signals a fundamental shift -- the orchestration layer may matter more than the models themselves. The key insight is separation of concerns: the orchestration layer handles task decomposition, state management, and tool coordination, while the model layer handles specific computations. This decoupling means teams can swap models as better alternatives emerge without redesigning the entire system. Perplexity has leaned heavily into this philosophy. The company is doubling down on packaging frontier models in a consumer-friendly user experience, arguing that there is value in orchestrating multiple third-party LLMs to obtain the most cost-effective and accurate answers to queries. Models, in Perplexity's view, are specializing, not commoditizing. The hybrid inference extension takes that logic one step further. Perplexity is now orchestrating not just across models but across physical compute locations -- choosing which model runs where. A lightweight local model might handle a privacy-sensitive document summarization task while a frontier cloud model tackles the complex reasoning required to analyze that summary against a broader market landscape. The orchestrator manages the handoff. This is a technically ambitious claim. Making it work reliably in production will require the orchestrator to accurately assess the complexity of each subtask, understand the sensitivity of the data involved, know the capabilities and latency characteristics of whatever local hardware the user has, and manage the state of a task that may be bouncing between environments mid-execution. It is easy to imagine edge cases where the routing logic fails, sends something sensitive to the cloud, or degrades performance by assigning a task to an underpowered local model. Perplexity says the system will be chip-agnostic, though the initial Computex demo ran on Intel silicon. The company expressed enthusiasm in its communications about the new AI chips announced at Computex this week, suggesting it intends to optimize across vendors. A $20 billion valuation, nine lawsuits, and the pressure to deliver The hybrid inference announcement arrives at a complicated moment for Perplexity. The company has been on a remarkable growth trajectory: It secured $200 million in new capital at a $20 billion valuation, just two months after raising $100 million at an $18 billion valuation. Since its founding three years ago, the rapidly growing AI company has raised $1.5 billion in total funding, according to PitchBook data. But the company also faces a mounting stack of legal challenges. Nine organizations have filed active suits against Perplexity for alleged copyright and trademark infringement as of May 31, 2026: CNN, the New York Times, News Corp and Dow Jones, the New York Post, the Chicago Tribune, Encyclopedia Britannica, Merriam-Webster, Reddit, and Japan's Yomiuri Shimbun. The CNN lawsuit, filed just days ago on May 28, is the most recent, accusing Perplexity of scraping more than 17,000 CNN stories, photos, videos, and other content and using that material to train its products. Perplexity has responded with a consistent message. "You can't copyright facts," the company's chief communications officer Jesse Dwyer said in a statement. Other publishers have opted for partnership over litigation. Time, Gannett, Le Monde, and Der Spiegel have signed licensing arrangements with Perplexity. The company launched a Publishers Program in mid-2024 in which participating outlets receive a share of revenue generated when their content is cited in Perplexity answers. According to CNBC, Perplexity's chief business officer Dmitry Shevelenko confirmed at the time that the flat rate was a double-digit percentage but declined to share specifics. As TechCrunch reported in December 2024, additional publishers including the LA Times, Adweek, The Independent, and Lee Enterprises subsequently joined the program, though not without internal controversy -- reporters at some outlets told TechCrunch they were not informed of the deals before they were announced publicly. The legal risk is not existential, but it is material, and with enterprises increasingly evaluating Perplexity's tools for sensitive workflows -- precisely the use case the hybrid inference system is designed to serve -- unresolved intellectual property questions could dampen adoption. How hybrid inference sharpens Perplexity's enterprise ambitions The hybrid inference demo should be read alongside Perplexity's broader push into enterprise software, a transformation that accelerated dramatically this year. At the Ask 2026 developer conference in March, VentureBeat reported that Perplexity announced Computer for Enterprise, positioning the three-year-old startup as a direct competitor to Microsoft, Salesforce, and the legacy enterprise software stack. Beyond Computer's existing 100-plus integrations, enterprise customers gained access to business-grade connectors for Snowflake, Datadog, Salesforce, SharePoint, and HubSpot, with administrators able to install custom connectors via the Model Context Protocol. The package also includes purpose-built workflow templates for legal contract review, finance audit support, sales call preparation, and customer support ticket triage, alongside SOC 2 Type II certification and the option for zero data retention. Hybrid inference deepens this enterprise pitch considerably. For regulated industries -- financial services, healthcare, defense, legal -- the ability to keep sensitive data on a local device while still accessing the reasoning power of frontier cloud models is not a nice-to-have. It is a potential compliance requirement. An investment bank parsing confidential deal documents, for instance, might be unable to send those materials to a third-party cloud under existing data handling agreements. A system that can run the sensitive parsing locally while routing non-sensitive analytical tasks to the cloud offers a middle path. IDC forecasts a tenfold increase in agent usage and a thousandfold growth in inference demands by 2027, and security and governance rank as the top evaluation factor for enterprise agentic platforms, according to a CrewAI survey. Hybrid inference speaks directly to that priority. The race to decide where AI actually runs is just getting started Several questions will determine whether Perplexity's Computex demonstration becomes a landmark product or a compelling prototype. The actual performance characteristics remain untested outside a controlled stage environment -- how the routing logic handles varied hardware configurations, unreliable network connections, and ambiguous data sensitivity classifications is an open question. The competitive response matters too: Google, Microsoft, Apple, and OpenAI are all building their own local-cloud AI architectures. Apple Intelligence already routes some tasks locally and some to Private Cloud Compute servers, Google's Gemini Nano runs on-device, and Microsoft's Copilot+ PCs are designed around local inference capabilities. None of these systems, however, currently offer the kind of dynamic, autonomous task-level routing Perplexity claims. Even if the technology works as demonstrated, there is the question of whether the business can keep pace with the ambition. At a $20 billion valuation with approximately $200 million in annual recurring revenue, Perplexity trades at roughly 100x revenue, a premium requiring aggressive growth to justify. Management's $656 million 2026 revenue target implies 230% growth, creating significant execution pressure. Perplexity has built its business on a bet that the future belongs not to any single model but to the system that orchestrates all of them. At Computex, it extended that bet from the software layer to the physical layer -- from which model to which machine. In the AI industry's relentless race to build bigger data centers and train larger models, Perplexity just argued that the most important computer in the stack might be the one already sitting on your desk.
[5]
Perplexity builds platform to split AI tasks between PCs and cloud By Investing.com
Investing.com -- Perplexity AI Inc. is developing a platform that distributes artificial intelligence work between personal computers and cloud-based servers to address the growing demand for AI computing power. The system functions as an air-traffic controller for AI tasks, deciding in real time which jobs can run locally on a PC and which parts require more powerful cloud servers, according to Perplexity Chief Executive Officer Aravind Srinivas. He announced the platform during the Computex conference in Taipei on Tuesday. The platform aims to reduce computing costs for AI-enabled tasks, Srinivas said in a Bloomberg Television interview. "You don't want all your compute centralized in servers and everything running through the largest models," he said. "You're already reading reports of how people are freaking out about their cost. Some people are spending half a billion dollars per month. What you actually want is for efficient value per watt per user." Srinivas made the announcement alongside Intel Corp. CEO Lip-Bu Tan, whose company leads the market for PC processors. The platform works with other technology, including Nvidia Corp. processors, and Perplexity plans to remain "chip agnostic," Srinivas said. The AI search company posted revenue growth from $100 million to $500 million, while headcount increased 34%, Srinivas wrote on X in April. The growth from larger competitors such as OpenAI and Anthropic PBC has helped fuel Perplexity's expansion, Srinivas said. "Every time any of the AI gets better, our unified system also gets better because we route across all of them," he said. This article was generated with the support of AI and reviewed by an editor. For more information see our T&C.
Share
Copy Link
Perplexity AI introduced a hybrid local-cloud inference system at Computex that automatically routes AI tasks between personal computers and cloud servers in real time. CEO Aravind Srinivas demonstrated the technology alongside Intel, showing how the system keeps sensitive data on-device while sending complex work to frontier models in the cloud—addressing both privacy concerns and the cost crisis of centralized AI inference.
Perplexity AI unveiled a hybrid local-cloud inference system at Computex 2026 that fundamentally changes how AI workloads are processed
4
. CEO Aravind Srinivas demonstrated the platform alongside Intel CEO Lip-Bu Tan during Intel's keynote address, describing it as an "air-traffic controller for AI tasks" that decides in real time which operations run locally on a user's device and which require cloud servers2
. The system will be added to Personal Computer, Perplexity's AI agent that works across files, apps, and the web, with the hybrid AI feature launching in July1
.
Source: 9to5Mac
What sets this approach apart is that the system makes routing decisions autonomously, task by task, without requiring users to choose between on-device AI models and cloud-based AI models upfront
3
. A smaller model running locally handles the decision-making about which information should remain on the device and which can be sent to more powerful frontier models in the cloud4
. "No product has done this before," a Perplexity spokesperson told VentureBeat4
.
Source: CNET
The announcement comes as companies grapple with massive AI infrastructure expenses. Srinivas referenced reports of organizations "spending half a billion dollars per month" on AI compute, emphasizing the need for "efficient value per watt per user"
5
. OpenAI's infrastructure costs have been widely reported at that scale, while Anthropic's projected $10.9 billion in Q2 revenue comes with substantial compute expenses that compress margins2
. By offloading AI inference tasks to the billions of PCs already in circulation, Perplexity can serve more users while reducing the burden on data centers2
.The hybrid system addresses privacy concerns by keeping sensitive data processing on local devices. Financial records, health information, and personal files can be handled by compact models running directly on user hardware without ever touching cloud computing infrastructure
3
. Meanwhile, complex tasks requiring frontier model capabilities—such as multi-step reasoning or retrieval-augmented generation across large datasets—get routed to servers2
. The system reportedly asks for user permission before sending sensitive tasks to the cloud, addressing data governance anxieties that enterprises have about agentic AI4
.While Srinivas made the announcement alongside Intel's CEO, he emphasized that the platform remains chip-agnostic and works with Nvidia processors as well as other local silicon
5
. The timing aligns strategically with major hardware announcements at Computex, where Nvidia unveiled its RTX Spark platform for AI-powered laptops and desktops1
. Intel showcased its Core Ultra Series 3 processors as the client silicon enabling hybrid inference on PCs4
.
Source: VentureBeat
The AI orchestrator creates direct economic incentives for users and enterprises to invest in more powerful local silicon. The more capable the on-device chip, the more inference can run locally, reducing cloud costs and improving latency for sensitive AI workloads
4
. This dynamic benefits chipmakers competing for AI PC market share while giving Perplexity a competitive edge in cost efficiency.Related Stories
Perplexity's financial trajectory underscores why cost efficiency matters for AI companies. Srinivas posted on X in April that the company's revenue grew fivefold, from $100 million to $500 million, while headcount increased just 34%
2
. That ratio reflects the leverage of AI-native business models and Perplexity's position as an aggregator that routes queries across multiple AI providers. "Every time any of the AI gets better, our unified system also gets better because we route across all of them," Srinivas explained5
.The hybrid compute platform extends this architectural efficiency to hardware. If Perplexity can use the compute already sitting on users' desks to handle a meaningful share of inference work, it reduces marginal cost per query while improving response times for lightweight tasks
2
. As AI moves deeper into enterprise workflows, the economics of who pays for compute—the cloud provider, the AI company, or the user's own hardware—will become a critical competitive variable. Perplexity Computer is currently available through the company's Mac app, with Windows support coming soon1
.Summarized by
Navi
12 Mar 2026•Technology

25 Feb 2026•Technology

12 Mar 2026•Technology

1
Technology

2
Policy and Regulation

3
Technology

News Categories