Tensordyne's Napier AI chip claims 13x faster token throughput than Nvidia Blackwell using log math

Reviewed byNidhi Govil

2 Sources

Share

AI chip startup Tensordyne has taped out its Napier processor, claiming it delivers 17x more tokens per watt and 13x higher throughput than Nvidia Blackwell. The chip uses logarithmic mathematics to convert energy-intensive multiplication into simple addition. Built on TSMC's 3nm process with Broadcom and HPE, commercial systems ship in late 2027.

Tensordyne Challenges Nvidia with Logarithmic AI Chip Architecture

AI chip startup Tensordyne has completed the tape-out of its Napier AI chip, marking a significant milestone for a company attempting to disrupt Nvidia's dominance in AI hardware

1

. The chip, built on TSMC 3nm process technology, is now in production with commercial sales of a 72-chip system scheduled for the second half of 2027

2

. Tensordyne claims its technology delivers 17x more tokens per watt and 13x higher token throughput compared to Nvidia Blackwell, representing a potential shift in how AI inferencing systems are designed and deployed

2

.

Source: Wccftech

Source: Wccftech

Logarithmic Matrix Multiplication Converts Multipliers Into Adders

The core innovation behind the Napier AI chip lies in its approach to matrix multiplication, the fundamental mathematical operation powering large language models. Tensordyne exploits a mathematical principle where the logarithm of A times B equals the logarithm of A plus the logarithm of B. "We've turned multipliers into adders," explains Gilles Backhus, a Tensordyne founder and vice president of AI

1

. Because adders are smaller and more energy efficient AI chip circuits than multipliers, Napier can pack more compute into a smaller area while consuming less power

1

. While the concept has been known for years, previous attempts failed because converting between logarithmic numbers and floating point introduced too many inaccuracies and consumed excessive time and energy. Tensordyne claims its engineers have solved this conversion challenge "very elegantly and very very accurately and cheaply on silicon"

1

.

Energy Efficient AI Chip Targets Growing Inference Market

The Napier platform addresses a critical shift in AI deployment economics as inference workloads increasingly dominate over training. Market trends, including the rise of AI agents, mean the cost and speed at which answers are delivered are starting to matter more than training new models

1

. Tensordyne designed its system to handle both prefill and decode stages of large language models execution. Prefill transforms input text into tokens and builds the key-value cache, a computationally heavy task. Decode generates output tokens sequentially, making it more dependent on memory and network latency than raw computing power . While Nvidia competitor strategies involve separate systems for these tasks—such as B300 GPUs for prefill and Groq 3 processors for decode—Tensordyne claims its platform handles both without requiring multiple vendors and multiple racks

1

.

Source: IEEE

Source: IEEE

Napier System Architecture Integrates Memory and Low-Latency Network

The Tensordyne Napier TDN system, designed in collaboration with Broadcom and HPE Juniper Networks, combines three key technologies

2

. Each Napier chip integrates substantial fast SRAM alongside 144 gigabytes of high-bandwidth memory to minimize idle compute cycles

1

2

. A custom low-latency network called Tensordyne Napier Link delivers sub-microsecond communication latency between processors, maximizing compute utilization

1

2

. A single pod system fitting in one quarter of a standard rack packs 72 Napier chips, 8 Intel Xeon CPUs, and 64 terabytes of solid-state storage

1

.

Multi-Trillion Parameter Models Performance Claims Against Nvidia

Tensordyne's most aggressive claims target both current and future Nvidia platforms. The company states a 4-pod rack working on a 2-trillion parameter LLM would deliver 1300 tokens per second per user at a cost of $11 for 1 million tokens while consuming 120 kilowatts of power

1

. For multi-trillion parameter models, Tensordyne claims its platform supports throughput of 1000 tokens per second per user in a single-rack configuration, a task that would require nine Nvidia Rubin plus Groq LPX racks

2

. The company projects up to $33 million more annual revenue per rack compared to Nvidia Blackwell systems

2

. However, these figures remain based on simulations, with real systems not expected until the end of 2025

1

. Tensordyne has achieved beta deployment milestones and reports over $200 million in forecasted Napier system demand

2

. Whether the company can deliver on these claims will determine if this AI chip startup can genuinely compete as a Nvidia competitor in the rapidly evolving AI hardware landscape.🟡,

Today's Top Stories

© 2026 TheOutpost.AI All rights reserved