Taalas raises $169 million to build custom AI chips that hardwire models into silicon

4 Sources

Share

Toronto-based chip startup Taalas secured $169 million in funding to develop AI chips that embed neural networks directly into silicon. The company claims its HC1 chip runs Meta's Llama 3.1 8B model 73 times faster than Nvidia's H200 while using one-tenth the power, challenging conventional approaches to AI inference.

Taalas Secures Major Funding to Challenge Nvidia in AI Chip Market

Toronto-based chip startup Taalas announced it raised $169 million in a funding round that brings its total capital to $219 million

1

. Investors include Quiet Capital, Fidelity, and prominent semiconductor venture capitalist Pierre Lamond

1

. The startup's approach centers on hardwiring AI models into silicon to deliver faster and cheaper AI applications, positioning itself to take on Nvidia in the rapidly evolving AI inference market.

Revolutionary Approach to Model-Specific AI Chips

The chip startup Taalas has developed a distinctive method for custom-designing chips that involves printing portions of an AI model directly onto silicon. This creates bespoke processors tailored for specific models such as Meta's Llama 3.1 8B language model

1

. CEO Ljubisa Bajic explained that "this hardwiring is partly what gives us the speed"

4

. The customized silicon pairs with large amounts of on-chip SRAM, similar to designs from Groq and Cerebras.

Source: Wccftech

Source: Wccftech

Technical Innovation Through Mask ROM Recall Fabric

Taalas assembles a nearly complete chip with roughly 100 layers, then performs final customization on just two metal layers

1

. These custom layers feature what the company describes as a mask ROM recall fabric, where each module can store four bits using only a single transistor to process matrix multiplications—the mathematical calculations that LLMs use to make decisions

2

. This design eliminates the need for HBM modules, avoiding the delays caused by moving information to and from external memory while removing various auxiliary components

2

.

Dramatic Performance Gains Over Current Solutions

The company's first product, called HC1, delivers remarkable results for AI inference tasks. Taalas claims its chip can generate 17,000 tokens per second (TPS) when running the Llama 3.1 8B model—73 times more than Nvidia's H200 graphics card while consuming one-tenth the power

2

. According to performance data, Taalas delivers 10 times the TPS of today's high-end infrastructure while achieving 20 times lower production costs

3

. The HC1 chip uses TSMC's 6nm node and features a chip size up to 815 mm², nearly matching Nvidia's H100 dimensions

3

.

Faster Manufacturing Timeline Creates Competitive Edge

Using TSMC for manufacturing, Taalas can complete fabrication of a chip customized for a particular model in approximately two months

1

. This contrasts sharply with the roughly six-month timeline required to fabricate an AI processor such as Nvidia's Blackwell

1

. The company's platform can transform any AI model into custom silicon, creating what it calls "Hardcore Models" that are an order of magnitude faster, cheaper, and lower power than software-based implementations

3

.

Addressing Response Latency in Agentic AI Environments

Taalas founded its approach 2.5 years ago around solving response latency, which has emerged as a massive constraint for modern compute providers

3

. In agentic environments, the primary competitive advantage lies in tokens per second figures and task completion speed. The company's solution focuses on specialization of AI workloads at the hardware level and merging storage and computation to overcome memory walls

3

. All computation happens at DRAM-level density to ensure faster intercommunication, avoiding the complex cooling, HBM integration, and packaging required by conventional ASICs

3

.

Ambitious Roadmap Targets Frontier Models

Taalas can currently produce chips capable of running less sophisticated models and is developing a new chip that can run a Llama model with 20 billion parameters, expected to be ready this summer

2

. The company has plans to build a processor called HC2 capable of deploying cutting-edge models such as GPT-5.2 by the end of this year

1

. To address scaling challenges, Taalas has adopted a cluster-based approach, demonstrating 12,000 TPS per user with DeepSeek's R1 in a 30-chip configuration

3

.

Market Context and Competitive Landscape

The announcement arrives weeks after Nvidia's deal to license intellectual property from chip startup Groq for $20 billion, which reignited interest in startups focused on AI inference—the process where an AI model responds to user queries

1

. Groq's first generation processor used an SRAM-heavy approach to chip design, as does Cerebras, which signed a cloud computing deal with OpenAI in January, and d-Matrix

1

. However, Taalas differentiates itself by pivoting away from general-purpose computing entirely toward model-specific implementations, though this means hardware remains specific to certain LLMs without the option to change model weights

3

.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo