3 Sources
3 Sources
[1]
xAI debuts a faster and more cost-effective version of Grok 4
A few months after the release of Grok 4 and an extremely problematic antisemitic meltdown of its chatbot, xAI is already trying to move on with its latest AI model. Elon Musk's xAI announced the release of Grok 4 Fast, a faster, more efficient reasoning model compared to its recent predecessor. According to xAI, Grok 4 Fast offers similar performance to Grok 4 while using 40 percent fewer thinking tokens on average. Along with faster results, xAI said Grok 4 Fast "results in a 98% reduction in price to achieve the same performance on frontier benchmarks as Grok 4," whether it's handling tasks that involve writing code or just browsing the web for quick responses. Similar to OpenAI's GPT-5 that alternates between a smart, efficient model and a deeper reasoning model, xAI's latest update includes a unified architecture that can transition between handling complex requests with its "reasoning" model and quick responses through its "non-reasoning model." In tests on LMArena, a platform that pits AI models against each other and provides side-by-side comparisons, Grok 4 Fast ranks first in search-related tasks and eighth in text-related tasks. xAI made Grok 4 Fast available for all users, including the free ones, on web, iOS and Android. However, with how competitive the LLM race is getting, it's only a matter of time before Google releases the next-gen version of Gemini or Anthropic updates the Claude Opus model beyond the recently released 4.1 version.
[2]
What to know about Grok 4 Fast for enterprise use cases
With all the AI news coming out each week, some of the more significant advancements can be hard to track. But xAI's new Grok 4 Fast model, released last Friday, is worth close consideration by enterprises and technical decision makers -- despite the ongoing statements by xAI founder Elon Musk about making Grok conform more to his politics and worldview, and its prior "MechaHitler" scandal on Musk's social network, X. Grok 4 Fast is streamlined version of xAI's flagship Grok 4 model released back in July 2025. The new version is designed to deliver near-frontier-level performance at dramatically lower cost. Built on the same infrastructure that powers xAI's most advanced systems, Grok 4 Fast is already reshaping cost/performance charts across the AI ecosystem, as shown in new analyses by researchers such as University of Pennsylvania Wharton School of Business Professor Ethan Mollick and third-party AI benchmarking firm Artificial Analysis. For enterprises, the launch signals two things: According to the official model card, Grok 4 Fast also introduces a "skip reasoning" mode for ultra-low latency applications, enabling enterprises to trade off depth of analysis for speed when appropriate . Performance: near-frontier results with fewer tokens According to xAI's official announcement, Grok 4 Fast matches or comes close to Grok 4 on most headline benchmarks while using about 40% fewer "thinking tokens." Tokens, of course, are the numerical representations of words and word fragments, code strings, and other units of information that an AI large language model (LLM) can ingest and output -- an LLM's "native language." "Thinking tokens," are those that are generated during a reasoning model's "chain-of-thought" process, so they may not actually even be outputted as the response to the user, yet would still consume energy and add cost for users -- as most AI providers including xAI charge for developer access to their AI models through an application programming interface (API) on a per-million token cost. But we'll cover that in a bit. Back to benchmarks: On AIME 2025 math, for instance, Grok 4 Fast scored 92% versus Grok 4's 91.7%; on GPQA Diamond, 85.7% versus 87.5%. Benchmarks in browsing and search tasks also show improvements: Grok 4 Fast scored 74% on xAI's X Bench Deepsearch (up from Grok 4's 66%). Independent evaluators back up these claims. Artificial Analysis places Grok 4 Fast at the top of its Intelligence Index on a price-per-million-token basis -- up to 64× cheaper than early frontier models such as OpenAI's o3 at launch, and about 12× cheaper than o3's current rates. A chart posted by Mollick on X shows Grok 4 Fast out on the far right of the GPQA/cost curve, indicating a new efficiency frontier. xAI's model card for Grok 4 highlights training the model with "large-scale reinforcement learning to maximize intelligence density" and explicitly post-trained it on tool use and safety demonstrations . Cost and licensing Grok 4 Fast is a proprietary model (not open source) available via the xAI API, OpenRouter, and Vercel AI Gateway. xAI has split the release into two SKUs: All versions support a 2 million-token context window, far larger than most commercial models. This pricing undercuts other "intelligence index >60" models and allows enterprises to run heavier workloads (legal analysis, software engineering, customer support, search augmentation) at far lower marginal cost. Both "grok-4-fast-reasoning" and "grok-4-fast-non-reasoning" are capped at 4 million tokens per minute and 480 requests per minute (RPM), with a 2 million token context window. xAI also offers a $0.05 per million cached input token option, which can further cut costs for repeated prompts and retrieval-augmented workloads. Older Grok models cost dramatically more: Grok 4 (0709) is listed at $3.00 input/$15.00 output per million tokens with only a 256k context -- underscoring Grok 4 Fast's steep price-to-performance advantage. Interestingly, xAI also states in its API documentation that it will fine users every time a "request is deemed to be in violation of our usage guideline by our system," specifically a "$0.05 per request usage guidelines violation fee." For enterprises planning high-volume deployments, note that regional endpoints and rate limits differ for some legacy vision models, but Grok 4 Fast appears globally available with consistent limits The model card makes clear that the API enforces a fixed system prompt prefix which embeds xAI's default safety policy; custom system messages from enterprise customers are appended to, not replaced by, this safety prompt . Key differentiators for enterprise use 1. Unified reasoning and non-reasoning modes Earlier xAI models required separate weights for reasoning vs. quick-answer tasks. Grok 4 Fast unifies these in a single architecture, cutting latency and simplifying integration. Developers can still tune via system prompts for more speed or more depth. The model card also notes that enabling reasoning mode generally lowers dishonesty rates and sycophancy compared to non-reasoning mode, a relevant point for enterprises needing factual accuracy . 2. State-of-the-art search and agentic capabilities Trained end-to-end with tool-use reinforcement learning, Grok 4 Fast can browse the web, query X in real time, follow links, ingest media, and synthesize findings. Benchmarks such as BrowseComp and X Browse show Grok 4 Fast outpacing Grok 4 in multi-hop search. However, the model card explicitly calls out that these advanced "agentic" capabilities introduce additional risks (such as autonomous action toward harmful goals), which xAI tests with AgentHarm and AgentDojo benchmarks to measure and mitigate misuse. in AgentHarm it completed only about 8-10% of malicious agentic tasks depending on mode, and in AgentDojo its attack success rate fell to 0-3%. In practice, that means Grok 4 Fast was largely able to refuse or deflect harmful or hijacking prompts even under adversarial conditions, indicating a high degree of robustness for enterprise deployments However, as the model card notes, these evaluations are under lab conditions; production deployments should still layer in their own access controls, auditing, and rate limiting for safety-critical contexts. 3. Long context window At a whopping 2 million tokens, Grok 4 Fast leads the pack of nearly all LLMs for the amount of information that can be exchanged between the user and AI model in a single input/output interaction. OpenAI's flagship GPT-5 model only offers 256,000 tokens, for instance, while Google Gemini 2.5 Pro is still at 1 million despite a pledge from Google to double that -- which would only match Grok 4 Fast. Two million tokens is roughly equivalent to 3,000 pages of text -- about the size of 10 books, all of which can be exchanged in one input/output! That means Grok 4 Fast can handle full knowledge bases, codebases, or legal documents, making it especially suitable for enterprise knowledge management, large-scale search, or retrieval augmented generation (RAG) pipelines -- the latter a common method for hooking up third-party AI models like Grok 4 Fast and its rivals to enterprise knowledge bases and data, securely. 4. Price and token efficiency Using 40% fewer thinking tokens for the same scores means lower inference bills and potentially lower latency. This is crucial for SaaS or consumer applications that depend on high query volumes. Drawbacks and considerations SpeechMap compliance scores, which measure how often the model generates controversial speech when instructed by a user, dropped. Independent evaluator SpeechMap.AI reports Grok 4 Fast scored only 77.5%-77.9% compliance, compared to 98% for Grok 4 and >90% for rival Sonoma models. xAI engineer Norman Mu confirmed on X that higher refusal rates were "an unintended side effect" of new training to prevent misuse, and pledged improvements. Enterprise customers building in regulated or sensitive domains should test prompt compliance carefully. GPQA Diamond likely saturated. Analysts note that leading models are clustering near the top of GPQA Diamond scores, suggesting this benchmark may no longer differentiate frontier reasoning quality. Enterprises should supplement with their own domain-specific evals. Latency and stability. While Grok 4 Fast is pitched as "Fast," xAI has not published full tokens/sec metrics. Enterprises with hard real-time needs should benchmark throughput under load. Artificial Analysis shows Grok 4 Fast is among the fastest models for tokens served per second at 227 t/s, yet still comes in third place behind OpenAI's GPT-oss-120b open source model and Google's Gemini 2.5 Pro. Licensing and support. At launch, Grok 4 Fast is broadly available (even to free users on grok.com) but enterprise-grade SLAs or managed deployments may lag behind the API rollout. Pricing beyond the introductory period could shift. Additional safety layers. The model card emphasizes Grok 4 Fast's built-in refusal and input filters for high-risk content -- including chemical, biological, radiological, nuclear, cyberattack, and CSAM-related prompts -- and shows a zero answer rate on such harmful requests under default settings . It also reports significantly lower attack success rates on AgentDojo prompt injection tests (0.00-0.03), which may give enterprises more confidence in production environments. Scaling story: not just brute force Grok 4 Fast rides on xAI's massive Colossus cluster in Memphis -- reportedly hundreds of thousands of high-end GPUs -- but its defining feature is efficiency, not raw scale. By unifying reasoning modes and training for tool use, xAI is trying to do more with less compute at inference. This is a key signal for the AI industry: the next competitive edge may come from test-time optimization, tool orchestration, and smarter architectures, rather than simply throwing more GPUs at the problem. The model card also underscores xAI's transparency moves -- publishing system prompts on GitHub and detailing its training recipe -- which may reassure enterprises needing auditability or compliance evidence for regulators . What enterprises should do now * Pilot test high-volume tasks. Grok 4 Fast's token pricing and long context window make it attractive for batch-heavy operations such as contract analysis, data enrichment, and code review. * Evaluate compliance and refusal behavior. If your business operates in regulated sectors, run your own SpeechMap-style tests to gauge refusal rates and bias. * Compare latency and throughput. Use your actual workloads to measure tokens per second and see if Grok 4 Fast meets SLA requirements. * Plan for multi-model strategies. Given the differences between reasoning and non-reasoning modes, and the rapidly changing benchmark landscape, consider keeping at least one fallback model in production. - Consider enabling "reasoning mode" with explicit honesty instructions for applications demanding high factual accuracy, as xAI's internal tests show lower deception rates under these conditions . Bottom line Grok 4 Fast is not just a cheaper Grok 4 -- it's a signal that frontier-level reasoning is becoming commoditized. With its massive context window, unified architecture, and tool-use reinforcement learning (RL), it's built to serve enterprises needing high-volume, high-context tasks at a fraction of prior costs. The main caution is around behavioral consistency and refusal rates, which xAI acknowledges are still being tuned. For most enterprise use cases, though, Grok 4 Fast represents one of the most compelling cost-efficiency options on the market today -- a chance to integrate frontier reasoning into customer-facing services or internal workflows without frontier-level bills. And unlike many competitors, Grok 4 Fast comes with a publicly documented safety approach, including benchmarks for abuse potential, deception, political bias, and dual-use knowledge -- giving enterprise leaders more insight into the trade-offs behind the model's performance.
[3]
Elon Musk's xAI Launches Grok 4 Fast With 2M Token Limit and 40% Lower Costs
xAI Launches Grok 4 Fast, Cutting Token Use by 40% While Matching Grok 4 Accuracy. Available Across Web, Apps, and APIs with Flexible Pricing Elon Musk's xAI has launched a new AI model, Grok 4 Fast. The model aims to keep costs low and maintain competitive accuracy by combining non-reasoning and reasoning abilities into a single system, thereby eliminating the need for separate frameworks. According to , Grok 4 Fast uses approximately 40% of the number of thinking tokens used by Grok 4. The performance levels are benchmarked with fewer tokens, yet the results are close to Grok 4. Based on the objective exploration done by Artificial Analysis, Grok 4 Fast could run with 98% less money while maintaining the same performance to improve its cost-performance ratio. The in AIME 2025, HMMT 2025, and the GPQA Diamond test gave scores of 85.7%, 92%, and 93.3%, respectively. Additionally, the model scored 95% on SimpleQA and 74% on X Bench Deepsearch, meaning that it can be applied to various tasks, including code execution and sophisticated search.
Share
Share
Copy Link
xAI unveils Grok 4 Fast, a new AI model that offers similar performance to its predecessor while using 40% fewer tokens and reducing costs by 98%. The model features a unified architecture for both reasoning and non-reasoning tasks, making it highly flexible for various applications.
xAI, the artificial intelligence company founded by Elon Musk, has announced the release of Grok 4 Fast, a new version of its flagship AI model that promises improved efficiency and cost-effectiveness. This latest iteration comes just months after the release of Grok 4 and aims to address the growing demand for more accessible and powerful AI solutions
1
.Source: engadget
Grok 4 Fast boasts similar performance to its predecessor while using 40% fewer "thinking tokens" on average. This reduction in token usage translates to a significant 98% decrease in price to achieve comparable performance on frontier benchmarks
1
2
.The model has demonstrated impressive results on various benchmarks:
2
3
One of the key innovations in Grok 4 Fast is its unified architecture, which combines non-reasoning and reasoning abilities into a single system. This approach eliminates the need for separate frameworks and allows for seamless transitions between handling complex requests and providing quick responses
1
2
.Related Stories
xAI has made Grok 4 Fast available through multiple channels, including web, iOS, and Android platforms, as well as via API access. The company offers two main SKUs for the model:
Both versions support a 2 million-token context window, which is significantly larger than most commercial models. This pricing structure undercuts other high-performance models and allows for more cost-effective deployment of heavy workloads such as legal analysis, software engineering, and customer support
2
.Source: Analytics Insight
The release of Grok 4 Fast signals a new frontier in the cost-performance ratio of AI models. Independent evaluators, including Artificial Analysis and Professor Ethan Mollick of the University of Pennsylvania's Wharton School of Business, have placed Grok 4 Fast at the top of efficiency charts
2
.As the AI industry continues to evolve rapidly, xAI's latest offering presents a compelling option for enterprises looking to leverage powerful AI capabilities while managing costs. However, the competitive landscape remains dynamic, with other major players like Google and Anthropic expected to release updates to their respective models in the near future
1
2
.Summarized by
Navi
[2]
[3]