6 Sources
6 Sources
[1]
Meta reveals four new MTIA chips built for AI inference -- to be released on a six-month cadence
The chiplet-based accelerators are designed to run AI inference more efficiently than GPUs optimized for training workloads. Meta today announced four successive generations of its in-house Meta Training and Inference Accelerator (MTIA) chips, all developed in partnership with Broadcom and scheduled for deployment within the next two years. "We've developed a competitive strategy for MTIA by prioritizing rapid, iterative development, reads Meta's press release, along with an inference-first focus and frictionless adoption by building natively on industry standards. The four new chips are MTIA 300, 400, 450, and 500. MTIA 300 is already in production for ranking and recommendations training, while 400 is currently in lab testing ahead of data center deployment. MTIA 450 and 500 are targeted at AI inference and are scheduled for mass deployment in early 2027 and later in 2027, respectively. According to Meta's technical blog, from MTIA 300 through to MTIA 500, HBM bandwidth increases 4.5 times, and compute FLOPs increase 25 times. Meta says MTIA 450 doubles the HBM bandwidth of MTIA 400, describing it as "much higher than that of existing leading commercial products," or, in other words, Nvidia's H100 and H200. MTIA 500 then adds another 50% HBM bandwidth on top of 450, along with up to 80% more HBM capacity. Indeed, it's HBM bandwidth and not raw FLOPs that are the main bottleneck during the decode phase of transformer inference, and mainstream GPUs are architected to maximize FLOPs for large-scale pre-training. This means they carry a cost and power overhead that Meta says is unnecessary for inference workloads. Meta's approach also includes hardware acceleration for FlashAttention and mixture-of-experts feed-forward network computation, plus custom low-precision data types co-designed for inference. MTIA 450 supports MX4, delivering six times the MX4 FLOPs of FP16/BF16, with mixed low-precision computation that avoids the software overhead of data type conversion. In terms of eventual deployment, MTIA 400, 450, and 500 will all use the same chassis, rack, and network infrastructure, meaning each new chip generation drops into the existing physical footprint for easy interchange. It's this modularity, Meta says, that's behind MTIA's roughly six-month chip cadence, which itself is much faster than the industry's typical one-to-two year cycle. The software stack runs natively on PyTorch, vLLM, and Triton, with support for torch.compile and torch.export so that production models can be deployed simultaneously on both GPUs and MTIA without MTIA-specific rewrites. Meta said it has already deployed hundreds of thousands of MTIA chips across its apps for inference on organic content and ads. All this comes just two weeks after Meta disclosed a long-term, $100 billion AI infrastructure agreement with AMD, suggesting that there's a broader effort at play to reduce dependence on Nvidia across different parts of Meta's AI stack while keeping MTIA at the core of inference workloads. Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.
[2]
Meta Preparing to Deploy Four New Homegrown Chips to Handle AI
Meta will continue buying chips from other companies, including Nvidia Corp. and Advanced Micro Devices Inc., as it pursues a dual approach of buying traditional hardware and investing in custom silicon for specialized tasks. Meta Platforms Inc. plans to deploy four new generations of its in-house artificial intelligence chips by the end of 2027 as the company turns to custom silicon to help power its rapidly expanding AI workloads. Meta on Wednesday announced plans for the new chips - MTIA 300, MTIA 400, MTIA 450 and MTIA 500 - as a part of an effort to diversify its hardware sources, reduce reliance on outside chipmakers and bring down costs amid a fast-moving and expensive AI race. Meta will continue buying chips from other companies as well, and recently announced deals to spend billions on AI hardware from Nvidia Corp. and Advanced Micro Devices Inc. The MTIA 300 is already in production for content ranking and recommendations training, the company said, and MTIA 400, also known as Iris, has completed lab testing and is moving toward deployment. The MTIA 450 and MTIA 500 chips -- code-named Arke and Astrid, respectfully -- are scheduled for mass deployment in 2027. Yee Jiun Song, Meta's vice president of engineering, said that the products are being built in parallel, with the MTIA 450 model expected to arrive early in the year and the MTIA 500 slated six months later. "If we look at overall AI development, I think even in the last two or three months things have accelerated at a pace that has kind of blown everyone's minds," Song said. "Silicon programs have to keep up with that evolution of workloads, so we're constantly looking at our road maps and making sure we're building what we think will be the most useful products." Meta is spending aggressively to build competitive AI models and products, which has led to unprecedented demand for computing power. Meta has turned to Nvidia and AMD to power some of its AI efforts, but has also worked to build out its bench of talent focused on chip design in hopes of developing its own products. Last year, after Chief Executive Officer Mark Zuckerberg grew impatient with the company's in-house progress, Meta tried to acquire South Korean chip startup FuriosaAI. After FuriosaAI rejected an $800 million offer, Meta instead acquired Santa Clara, California-based startup Rivos Inc., along with more than 400 of its employees. Get the Tech Newsletter bundle. Get the Tech Newsletter bundle. Get the Tech Newsletter bundle. Bloomberg's subscriber-only tech newsletters, and full access to all the articles they feature. Bloomberg's subscriber-only tech newsletters, and full access to all the articles they feature. Bloomberg's subscriber-only tech newsletters, and full access to all the articles they feature. Bloomberg may send me offers and promotions. Plus Signed UpPlus Sign UpPlus Sign Up By submitting my information, I agree to the Privacy Policy and Terms of Service. The additional headcount has helped Meta's in-house chips team, known as the Meta Training and Inference Accelerator, pursue several projects at once. MTIA is focused on building more efficient computing architecture for the company's internal needs, which range from ranking and recommendations systems used to determine what content appears on users' Instagram feeds to large-scale generative AI inference, where a trained model is used to generate text or images in response to a prompt. While Meta executives have emphasized the benefits of the company building its own chips, it's also one of the biggest buyers of graphics processing units, or GPUs, used for training and running AI models. Its recent agreements with Nvidia and AMD are worth tens of billions of dollars each, and mean Meta has locked in gigawatts of AI capacity over the coming years. The strategy reflects the company's dual approach of buying more traditional hardware from industry partners while continuing to invest in custom silicon for tasks that are more specialized to Meta's platforms. "We're not building for the general market, so our chips don't need to be as general purpose," Song said. "We can cut out things that we don't need, which really allows us to drive down cost." Still, the economics of chipmaking are challenging. Taking a product from the design phase to manufacturing by a third party - usually Taiwan Semiconductor Manufacturing Co. - can cost billions of dollars and take precious time. Song said it typically takes his team two years to go from design to production. Custom chips usually only pay off at scale and with high utilization rates. Last month, the Information reported that Meta had scrapped its most advanced chip focused on training AI models, known by the code name Olympus, after struggling with its design, shifting instead to focus on a less complicated version. A Meta spokesperson declined to comment on the reporting, but said that the company regularly evaluates and evolves its silicon road map and learns from product deployments. Meta Chief Financial Officer Susan Li said at a conference hosted by Morgan Stanley earlier this month that the company still aims to develop processors that can train AI models.
[3]
Meta rolls out in-house AI chips weeks after massive Nvidia, AMD deals
The specialized silicon is part of the Meta Training and Inference Accelerator, or MTIA, family of chips, which it publicly revealed for the first time in 2023 before unveiling a second-generation version in 2024. Meta Vice President of Engineering Yee Jiun Song told CNBC that by designing custom chips, which are then manufactured by Taiwan Semiconductor, the social media giant can squeeze more price per performance across its data center fleet rather than relying on only vendors. "This also provides us with, with more diversity in terms of silicon supply, and insulates us from price changes to some extent," Song said. "This is a little bit more leverage." The first new chip, MTIA 300, was deployed a few weeks ago and is intended to help train smaller AI models that underpin Meta's core ranking and recommendation tasks, Song said. These kinds of tasks include showing people relevant content and online ads within the company's family of apps like Facebook and Instagram. The upcoming chips -- MTIA 400, MTIA 450 and MTIA 500 -- are intended for more cutting-edge generative AI-related inference tasks like creating images and videos based on people's written prompts. The chips will not be used for training giant large language models, Song said.
[4]
Meta's Race to Scale AI Chips for Billions: Four Chips in Two Years
Seamless model onboarding: MTIA supports both eager and graph modes. In graph mode, it integrates directly with PyTorch 2.0's compilation pipeline. Developers use familiar tools -- torch.compile and torch.export -- to capture and optimize model graphs. No MTIA-specific rewrites are required to enable models. This portability enables our production models to be deployed simultaneously on both GPUs and MTIA. Compilers: Beneath the PyTorch frontend, MTIA-specific compilers translate high-level graph representations into highly optimized device code. The graph compiler is built on Torch FX IR and TorchInductor. The kernel compiler and lower-level backends are based on Triton, MLIR, and LLVM, enhanced and optimized for MTIA. We improved and tailored TorchInductor's Triton code generations and kernel fusion for MTIA, and introduced MTIA-aware MLIR dialects and Triton DSL extensions. These extensions can be used optionally for performance-critical kernels. The compiler stack has autotuning capabilities that automatically optimize workloads using multiple compilation strategies. Kernel authoring: MTIA supports compiler-driven kernel generation and fusion, enables both auto-generated and user-driven manual kernel authoring using Triton and C++, and provides kernel auto-tuning and optimizations. Furthermore, we have built agentic AI systems to automate kernel generation; see our papers on TritorX and KernelEvolve for details. Communication and transport: MTIA's communication library, Hoot Collective Communications Library (HCCL), is similar to GPU communication libraries but offers several differentiators. It leverages the MTIA chips' built-in network chiplets for efficient communication, offloads collective operations to dedicated message engines, and uses near-memory compute to accelerate reduction-heavy collectives. HCCL also supports fusing compute and collective kernels to minimize latency. Finally, its transport stack is optimized for low-latency transactions and offloads the entire data path to reduce host-stack runtime overhead. Runtime and firmware: The MTIA runtime manages device memory, kernel scheduling, and execution coordination across multiple devices. It supports both eager and graph execution modes. Additionally, it orchestrates compute and collective operations in an Inductor-native, eager-style graph mode. This approach enables compute and communication to be captured and scheduled together, providing a GPU-like experience with minimal overhead. The runtime interfaces with a Rust-based user-space driver, rather than a traditional in-kernel Linux driver. The firmware is written in bare-metal Rust, delivering low latency and high performance, with built-in memory and thread safety. vLLM support : vLLM's plugin architecture allows easy integration with MTIA. Our MTIA plugin replaces important operators, such as FlashAttention and fused LayerNorm, with MTIA-specific kernels. Graph-mode execution is supported via a custom torch.compile backend. MTIA inherits and benefits from vLLM's features such as prefill-decode disaggregation and continuous batching. Production tools: To reliably operate hundreds of thousands of MTIA chips in production, MTIA offers production-grade monitoring, profiling, and debugging tools comparable to those available for mainstream GPUs, while providing unique capabilities such as full-stack, at-scale observability across both host and device, spanning software, firmware, and hardware. Its debugger enables fine-grained control, including breakpoints and coordinated stepping at the PE level.
[5]
Meta debuts internally-developed AI chips for inference workloads - SiliconANGLE
Meta debuts internally-developed AI chips for inference workloads Meta Platforms Inc. today revealed that it has designed 4 custom chips to power its internal artificial intelligence workloads. The company last provided an update about its processor development efforts in 2024. In April of that year, it revealed a custom AI accelerator with an energy footprint of 90 watts. The most advanced of the 4 accelerators that Meta debuted today has a thermal design point of 1,700 watts. The custom chip that the company revealed in April 2024, the MTIA 200, was designed solely to run ranking and recommendation models. Those are neural networks that Meta uses to determine what posts and ads to display in users' feeds. The first new chip that it unveiled today, the MTIA 300, is focused on the same use cases. It can provide 1.2 petaflops of performance when processing data in the MX8 format and features 216 gigabytes of HBM memory. "MTIA 300 comprises one compute chiplet, two network chiplets, and several HBM stacks," a group of Meta engineers wrote in a blog post today. "Each compute chiplet comprises a grid of processing elements (PEs), with some redundant PEs to improve yield." The MTIA 300 is the only one of the 4 newly revealed chips that Meta has already deployed in production. The 3 other processors support a broader range of use cases. Besides ranking and recommendation workloads, they can also run generative AI software such as large language models. The most advanced chip in the lineup, the MTIA 500, can provide 10 petaflops of performance when processing MX8 data. It also supports a more efficient data format called MX4. The latter technology reduces the number of bytes that AI models must analyze to answer prompts, which speeds up processing. The MTIA 500 carries out calculations using four logic chiplets. The modules are surrounded by multiple stacks of HBM memory that can together store up to 516 gigabytes of data, or twice as much as the MTIA 300. Rounding out the processor's list of components is a so-called SoC chiplet that is responsible for moving information to and from the host server. The MTIA 500 is expected to enter production in 2027 alongside a similar, but less advanced chip called the MTIA 450. Both processors are optimized for generative AI inference workloads. They include circuits that are designed to accelerate specific, hardware-intensive elements of the inference workflow such as FlashAttention. That's a popular implementation of the attention mechanism with which LLMs analyze input data. "At the system level, MTIA 400, 450, and 500 all utilize the same chassis, rack, and network infrastructure," the Meta engineers wrote. "Therefore, each new chip generation can be dropped into the same physical footprint, accelerating the transition from silicon to production deployment. Our modular, reusable designs also minimize the resources needed to develop and deploy multiple chip generations." Meta uses custom compilers to optimize AI models for its MTIA chips. Another custom software module, the Hoot Collective Communications Library, manages the flow of data between the processors. It carries out certain calculations using transistors that are located near memory cells, which reduces data travel times and thereby boosts performance.
[6]
Why is Meta building its own AI chips, and at what cost?
The timing is awkward, even by Silicon Valley standards. Last week, reports emerged that Meta had quietly killed its most ambitious AI training chip, codenamed Olympus. According to The Information, Meta scrapped the chip after struggling with its design, pivoting instead to a less complicated approach. Meta declined to comment. Then suddenly the company announced four new generations of its homegrown MTIA chips - 300, 400, 450, and 500 - and said nothing about any of it. Also read: Elon Musk's crazy idea to turn Grok into an AI agent for your PC The four chips it announced are all inference-focused, designed to run AI models cheaply at scale, not to train them. That's a meaningful distinction. Training is where Nvidia's stranglehold on the industry is strongest, and it's the arena Meta just quietly retreated from. The official rationale for building custom silicon is straightforward enough. Meta's stated goal is to diversify its hardware sources, reduce reliance on outside chipmakers, and bring down costs amid a fast-moving and expensive AI race. Its MTIA inference chip reportedly reduced total cost of ownership by 40 to 44 percent across recommendation systems for Facebook and Instagram, real savings at a company serving billions of users daily. When you're running AI at that scale, shaving small margins off for each inference request adds up astronomically. Also read: Anthropic Institute wants to warn us on how AI is bad for human civilization But the "at what cost" question has a literal answer that complicates the independence narrative. In January 2026, Meta announced a capital expenditure budget of between $115 billion and $135 billion for the year which is nearly double the previous year's $72.2 billion, with the majority allocated to chips. And most of that is going straight to the companies Meta says it wants to depend on less. Within ten days in February, Meta signed a multi-year strategic partnership with Nvidia to deploy millions of Blackwell and next-generation GPUs, followed by a chip agreement with AMD worth between $60 billion and $100 billion. There's also the question of what "homegrown" actually means here. Meta's MTIA chips are developed in close partnership with Broadcom - the same company that co-designs Google's TPUs. The press release calls them Meta's chips. The reality is considerably more collaborative. Meta's chip program has a history of setbacks. It scrapped an earlier inference chip after it underperformed in small-scale testing, and pivoted in 2022 to billions of dollars' worth of Nvidia GPUs. Olympus is the just the latest addition to that pattern. Meta's own Chief Product Officer once described the company's chip journey as "a walk, crawl, run situation." Right now, it still looks like a crawl - an expensive, ambitious, necessary one, but a crawl nonetheless.
Share
Share
Copy Link
Meta announced four successive generations of its in-house Meta Training and Inference Accelerator chips, developed with Broadcom and scheduled for deployment through 2027. The MTIA 300 is already running in production, while the MTIA 500 promises 25 times more compute power and 4.5 times greater HBM bandwidth. This aggressive push into custom silicon development reflects Meta's strategy to diversify hardware sources and cut costs in the expensive AI race.
Meta has announced four successive generations of its in-house Meta AI chips, marking an aggressive expansion of its custom silicon development efforts to power AI workloads across its platforms. The Meta Training and Inference Accelerator (MTIA) family now includes MTIA 300, 400, 450, and 500, all developed in partnership with Broadcom and manufactured by Taiwan Semiconductor Manufacturing Co.
1
2
. The MTIA 300 is already in production for ranking and recommendation models, while MTIA 400 has completed lab testing and is moving toward deployment2
. The MTIA 450 and MTIA 500 chips are scheduled for mass deployment in early 2027 and later in 2027, respectively1
.
Source: Meta AI
From MTIA 300 through to MTIA 500, HBM bandwidth increases 4.5 times, and compute FLOPs increase 25 times
1
. The MTIA 450 doubles the HBM bandwidth of MTIA 400, which Meta describes as "much higher than that of existing leading commercial products"—a direct reference to Nvidia's H100 and H200 accelerators1
. The MTIA 500 adds another 50% HBM bandwidth on top of MTIA 450, along with up to 80% more HBM capacity, reaching 516 gigabytes total5
. These in-house AI chips are specifically architected for AI inference rather than training, addressing the reality that HBM bandwidth, not raw FLOPs, represents the main bottleneck during the decode phase of transformer inference1
.
Source: Tom's Hardware
The MTIA chips utilize a chiplet-based design, with the MTIA 300 comprising one compute chiplet, two network chiplets, and several HBM stacks
5
. The more advanced MTIA 500 carries out calculations using four logic chiplets surrounded by multiple stacks of HBM memory5
. Critically, MTIA 400, 450, and 500 all utilize the same chassis, rack, and network infrastructure, meaning each new chip generation drops into the existing physical footprint for easy interchange1
. This modularity enables MTIA's roughly six-month chip cadence, which is much faster than the industry's typical one-to-two year cycle1
. Yee Jiun Song, Meta's vice president of engineering, emphasized that silicon programs must keep pace with AI development, which has "accelerated at a pace that has kind of blown everyone's minds" even in recent months2
.
Source: Bloomberg
The aggressive push into custom silicon reflects Meta's effort to diversify its hardware sources, reduce reliance on outside chipmakers, and bring down costs amid a fast-moving and expensive AI race
2
. Song told CNBC that by designing custom chips, Meta can squeeze more price per performance across its data center performance fleet and gain "more diversity in terms of silicon supply, and insulates us from price changes to some extent"3
. Meta has already deployed hundreds of thousands of MTIA chips across its apps for inference on organic content and ads1
. However, Meta will continue buying chips from other companies, including Nvidia and AMD, pursuing a dual approach of traditional hardware purchases alongside custom silicon for specialized tasks2
. This announcement comes just two weeks after Meta disclosed a long-term, $100 billion AI infrastructure agreement with AMD1
.Related Stories
The software stack runs natively on PyTorch, vLLM, and Triton, with support for torch.compile and torch.export so that production models can be deployed simultaneously on both GPUs and MTIA without MTIA-specific rewrites
1
4
. Meta-specific compilers translate high-level graph representations into optimized device code, with the graph compiler built on Torch FX IR and TorchInductor . The kernel compiler and lower-level backends are based on Triton, MLIR, and LLVM4
. The MTIA chips include hardware acceleration for FlashAttention and mixture-of-experts feed-forward network computation, plus custom low-precision data types co-designed for inference1
. MTIA 450 supports MX4, delivering six times the MX4 FLOPs of FP16/BF161
. The Hoot Collective Communications Library (HCCL) manages data flow between processors, leveraging built-in network chiplets for efficient communication and offloading collective operations to dedicated message engines4
.The MTIA 300 was deployed a few weeks ago and is intended to help train smaller AI models that underpin Meta's core ranking and recommendation tasks, including showing people relevant content and online ads within Facebook and Instagram
3
. The upcoming chips—MTIA 400, 450, and 500—are intended for more cutting-edge generative AI inference tasks like creating images and videos based on people's written prompts3
. Song clarified that the chips will not be used for training giant large language models3
. "We're not building for the general market, so our chips don't need to be as general purpose," Song explained. "We can cut out things that we don't need, which really allows us to drive down cost"2
. The economics of chipmaking remain challenging, with products typically taking two years to go from design to production and costing billions of dollars, but custom chips can pay off at scale with high utilization rates2
.Summarized by
Navi
[1]
27 Feb 2026•Technology

12 Mar 2025•Technology

27 Feb 2026•Business and Economy

1
Technology

2
Policy and Regulation

3
Business and Economy
