NVIDIA's Blackwell GPUs Break AI Performance Barriers, Achieving Over 1,000 TPS/User with Meta's Llama 4 Maverick

Reviewed byNidhi Govil

2 Sources

NVIDIA sets a new world record in AI performance with its DGX B200 Blackwell node, surpassing 1,000 tokens per second per user using Meta's Llama 4 Maverick model, showcasing significant advancements in AI processing capabilities.

NVIDIA Shatters AI Performance Records with Blackwell GPUs

NVIDIA has once again pushed the boundaries of AI performance, breaking the 1,000 tokens per second (TPS) per user barrier with Meta's Llama 4 Maverick large language model. This groundbreaking achievement was accomplished using NVIDIA's latest DGX B200 node, which features eight Blackwell GPUs 1.

Source: Tom's Hardware

Source: Tom's Hardware

Record-Breaking Performance

The new benchmark set by NVIDIA's Blackwell architecture is a significant leap forward in AI processing capabilities:

  • Achieved 1,038 TPS/user, surpassing the previous record of 792 TPS/user held by SambaNova by 31%
  • Outperformed competitors like Amazon and Groq, who scored just under 300 TPS/user
  • Other companies, including Google Vertex and Azure, achieved scores below 200 TPS/user 1

Optimizations Driving Performance Gains

NVIDIA's record-breaking result was achieved through a combination of hardware power and software optimizations:

  1. Extensive software optimizations using TensorRT-LLM
  2. Implementation of a speculative decoding draft model using Eagle-3 techniques
  3. Utilization of FP8 data types for improved accuracy
  4. Application of Attention operations and Mixture of Experts AI technique
  5. CUDA kernel optimizations, including spatial partitioning and GEMM weight shuffling 1 2

These optimizations resulted in a 4x performance uplift compared to Blackwell's previous best results.

Source: Wccftech

Source: Wccftech

Significance of TPS/User Metric

The tokens per second per user (TPS/user) metric is crucial for AI chatbot developers:

  • Measures the speed at which a GPU cluster can process tokens for individual users
  • Directly impacts the responsiveness of AI chatbots like ChatGPT and Copilot
  • Focuses on single-user performance rather than batched processing 1

Speculative Decoding: A Key Innovation

NVIDIA's implementation of speculative decoding played a significant role in achieving this performance milestone:

  • Utilizes a smaller, faster "draft" model to predict several tokens ahead
  • The main (larger) model verifies these predictions in parallel
  • Accelerates inference speed without compromising text quality
  • Based on the EAGLE3 software architecture for LLM inference acceleration 2

Implications for AI Industry

NVIDIA's achievement has far-reaching implications for the AI industry:

  • Demonstrates NVIDIA's leadership in AI hardware and software optimization
  • Sets a new standard for AI performance, particularly for large language models
  • Paves the way for more responsive and efficient AI-powered applications
  • Highlights the growing importance of token generation speeds as a benchmark for AI progress 2

As AI continues to evolve, NVIDIA's Blackwell architecture and its optimizations for large-scale LLMs position the company at the forefront of AI technology, promising faster and more seamless AI interactions in the future.

Explore today's top stories

NVIDIA Unveils Major GeForce NOW Upgrade with RTX 5080 Performance and Expanded Game Library

NVIDIA announces significant upgrades to its GeForce NOW cloud gaming service, including RTX 5080-class performance, improved streaming quality, and an expanded game library, set to launch in September 2025.

CNET logoengadget logoPCWorld logo

10 Sources

Technology

19 hrs ago

NVIDIA Unveils Major GeForce NOW Upgrade with RTX 5080

Nvidia Develops New AI Chip for China Amid Geopolitical Tensions

Nvidia is reportedly developing a new AI chip, the B30A, based on its latest Blackwell architecture for the Chinese market. This chip is expected to outperform the currently allowed H20 model, raising questions about U.S. regulatory approval and the ongoing tech trade tensions between the U.S. and China.

TechCrunch logoTom's Hardware logoReuters logo

11 Sources

Technology

19 hrs ago

Nvidia Develops New AI Chip for China Amid Geopolitical

SoftBank's $2 Billion Investment in Intel: A Strategic Move in the AI Chip Race

SoftBank Group has agreed to invest $2 billion in Intel, buying common stock at $23 per share. This strategic investment comes as Intel undergoes a major restructuring under new CEO Lip-Bu Tan, aiming to regain its competitive edge in the semiconductor industry, particularly in AI chips.

TechCrunch logoTom's Hardware logoReuters logo

18 Sources

Business

11 hrs ago

SoftBank's $2 Billion Investment in Intel: A Strategic Move

Databricks Secures $100 Billion Valuation in Latest Funding Round, Highlighting AI Sector's Rapid Growth

Databricks, a data analytics firm, is set to raise its valuation to over $100 billion in a new funding round, showcasing the strong investor interest in AI startups. The company plans to use the funds for AI acquisitions and product development.

Reuters logoAnalytics India Magazine logoU.S. News & World Report logo

7 Sources

Business

3 hrs ago

Databricks Secures $100 Billion Valuation in Latest Funding

OpenAI Launches Affordable ChatGPT Go Plan in India, Eyeing Global Expansion

OpenAI introduces ChatGPT Go, a new subscription plan priced at ₹399 ($4.60) per month exclusively for Indian users, offering enhanced features and affordability to capture a larger market share.

TechCrunch logoBloomberg Business logoReuters logo

15 Sources

Technology

11 hrs ago

OpenAI Launches Affordable ChatGPT Go Plan in India, Eyeing
TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo