Meta unveils four MTIA chip generations to power AI inference and reduce Nvidia dependence

Reviewed byNidhi Govil

7 Sources

Share

Meta announced four successive generations of its custom MTIA chips on March 11, scheduled for deployment through 2027. The MTIA 300, 400, 450, and 500 are optimized for AI inference workloads, with HBM bandwidth increasing 4.5 times across the lineup. Meta has already deployed hundreds of thousands of MTIA chips in production, joining Google, AWS, and Microsoft in building custom silicon to diversify from sole reliance on Nvidia.

Meta Accelerates Custom Silicon Development with Four MTIA Chip Generations

Meta announced four successive generations of its custom Meta Training and Inference Accelerator (MTIA) chips on March 11, marking an aggressive push into in-house AI chips designed to handle the company's rapidly expanding AI inference workloads

1

. The MTIA 300, 400, 450, and 500 are all scheduled for deployment over the next two years, with Meta describing the chips as progressively optimized for AI inference on the premise that HBM memory bandwidth is the binding constraint on inference performance

1

.

Source: Meta AI

Source: Meta AI

The MTIA 300 is already in production for content ranking and recommendations training, while the MTIA 400—also known as Iris—has completed lab testing and is moving toward data center deployment

2

. The MTIA 450 and MTIA 500 chips, code-named Arke and Astrid respectively, are scheduled for mass deployment in early 2027 and later in 2027

2

. According to Yee Jiun Song, Meta's vice president of engineering, the products are being built in parallel on a roughly six-month cadence, much faster than the industry's typical one-to-two year cycle

3

.

HBM Bandwidth Emerges as Critical Performance Driver for AI Inference Workloads

Meta doubled HBM bandwidth from MTIA 400 to 450, making it "much higher than that of existing leading commercial products," a reference to Nvidia's H100 and H200 chips

3

. The MTIA 500 then increases HBM bandwidth by an additional 50% compared with the MTIA 450, along with up to 80% more HBM capacity

1

. Across the full 300-to-500 progression, HBM bandwidth increases 4.5 times and compute FLOPs increase 25 times

3

.

Source: Tom's Hardware

Source: Tom's Hardware

In a technical blog post, Meta described HBM's bandwidth as the most important factor affecting AI inference performance, adding that mainstream chips built for large-scale pre-training are applied less cost-effectively to AI inference workloads

1

. The chips use a modular chiplet architecture that allows the MTIA 400, 450, and 500 to share the same chassis, rack, and network infrastructure, meaning each new chip generation drops into the existing physical footprint without requiring new data center buildouts

1

.

Strategic Move to Reduce Reliance on Nvidia While Maintaining Dual Hardware Approach

The announcement comes just two weeks after Meta disclosed a long-term, $100 billion AI infrastructure agreement with AMD, suggesting a broader effort to reduce reliance on Nvidia across different parts of Meta's AI stack

3

. However, Meta will continue buying chips from other companies, including Nvidia and AMD, as it pursues a dual approach of buying traditional hardware and investing in custom silicon development for specialized tasks

2

.

Source: Bloomberg

Source: Bloomberg

Song told CNBC that by designing custom chips, which are then manufactured by Taiwan Semiconductor, Meta can squeeze more price per performance across its data center fleet rather than relying only on vendors

4

. "This also provides us with more diversity in terms of silicon supply, and insulates us from price changes to some extent," Song said. "This is a little bit more leverage"

4

. Meta has already deployed hundreds of thousands of MTIA chips in production, onboarded numerous internal production models, and tested MTIA with large language models like Llama

1

.

Hyperscalers Converge on Inference-Optimized Custom Silicon and Shared Software Stack

Meta's announcement puts the company alongside Google, AWS, and Microsoft, each of which has spent the last few years building and scaling custom silicon programs for AI accelerators

1

. Google announced Ironwood, its seventh-generation TPU purpose-built for inference, at Google Cloud Next in April 2025, delivering 192 GB of HBM3E per chip at 7.37 TB/s of memory bandwidth

1

. AWS announced Trainium3, a 3nm chip with 144 GB HBM3E per chip at 4.9 TB/s bandwidth, while Microsoft introduced its Maia 200 for inference workloads built on TSMC 3nm

1

.

Broadcom is connecting the dots across many of these programs, having had a hand in building both Google's TPUs and Meta's MTIA family

1

. Meta described the MTIA chips as being developed "in close partnership with" Broadcom, and said that the company "has remained and will continue" to be a key partner of Meta's AI infrastructure strategy

1

. The software stack runs natively on PyTorch, vLLM, and Triton, with support for torch.compile and torch.export so that production models can be deployed simultaneously on both GPUs and MTIA without MTIA-specific rewrites

3

. MTIA supports both eager and graph modes, integrating directly with PyTorch 2.0's compilation pipeline, with compilers built on Torch FX IR, TorchInductor, MLIR, and LLVM

5

.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo