2 Sources
[1]
Broadcom's custom ASIC biz adds South Korea's FuriosaAI to its empire
Broadcom has added FuriosaAI to its list of partners building AI accelerators atop its chip packaging tech. After years quietly serving as the connective tissue behind many modern processors, Broadcom has emerged from the shadows to bask in the glow of the AI bubble. The silicon slinger's latest tie-up aims to adapt FuriosaAI's Tensor Contraction Processor tech into a multi-die system-on-package designed for the high-volume AI inference workloads that are all the rage these days. Details on the new chip remain thin, but FuriosaAI claims that the processor will be fabbed on a 2nm process and make use of "dual layer" HBM4 or HBM4e memory made possible by Broadcom's advanced packaging tech. We've previously explored Broadcom's Extreme Dimension System in Package (3.5D XDSiP) tech, which aims to simplify the process of bringing complex multi-die accelerators similar to AMD's MI300 series to market. The tech effectively disaggregates core compute, memory, I/O, and low level logic into distinct chiplets and then assembles them using 3D packaging techniques, like hybrid bonding, into a single logical chip. Rather than having to design a full chip from the ground up, the offering allows chip designers to focus on the processor's core logic functions, reducing the time, capital, and risk associated with bringing a new part to market. Along with Broadcom's advanced packaging tech, FuriosaAI's third-gen chips will also make use of its Ethernet and PCIe products to support systems exceeding eight chips -- the limit of its current lineup of parts. This implies the use of high-radix Ethernet switches, like Broadcom's Tomahawk 6 (TH6), for either conventional scale out networks or dense scale up networks. While scale up networks have been dominated by proprietary interconnects, like NVLink, Ethernet is gaining traction as an alternative. In fact, AMD is tunneling UALink, an emerging alternative to NVLink, over Ethernet, with at least some OEM implementations using Broadcom's TH6 switches. FuriosaAI's latest collab comes roughly a year after the South Korean startup officially launched its second-gen RNGD -- pronounced "renegade" -- accelerators. The PCIe-based cards were fairly modest, offering performance closer to that of high-end workstation GPUs or older datacenter chips than what you'd expect from a modern AI chip. Each RNGD chip offered up to 512 teraFLOPS of dense FP8 compute, 48 GB of HBM3 across two stacks, and 1.5 TB/s of memory bandwidth. To put that in perspective, Nvidia's B200 offered roughly 9x the FP8 FLOPS, 4x the memory capacity, and more than 5x the bandwidth. However, that Nvidia performance comes at the cost of a 1,000-watt TDP, whereas FuriosaAI claims RNGD sips just 180 watts of power from the wall. This lower power envelope meant that even with eight of the chips in a single node, FuriosaAI's systems could easily be deployed in traditional, air-cooled datacenters without needing rack modifications. So while nowhere near as powerful as Nvidia or AMD's latest chips, FuriosaAI's tech has managed to win over several key customers, including LG, which is running its Exaone family of models on RNGD. FuriosaAI is only the latest chip designer to make its relationship with Broadcom public. It's an open secret that IP houses like Broadcom and Marvell have been responsible for designing large pieces of the custom AI silicon that finds its way into hyperscalers' data halls. However, the role Broadcom has played in the success of these chips has been a closely guarded secret until recently. Earlier this year, Meta became one of the first to break the silence, revealing four new AI accelerators under its MTIA portfolio designed with Broadcom's help. From images, we can tell the chips are clearly based on Broadcom's XDSiP tech and boast performance on par with or even exceeding that of Nvidia and AMD's next-gen GPUs coming out next year. The MTIA 500, for example, promises 30 petaFLOPS of MXFP4 performance, between 384 and 512 GB of HBM, and up to 27.6 TB/s of memory bandwidth. Broadcom has also made its relationship with Google official. As part of a collaboration with Anthropic in April, Google said that it would supply the US model dev with gigawatts of TPU capacity co-developed with Broadcom. Custom accelerator IP has become big business for Broadcom, accounting for 65 percent of total revenue during the company's first quarter of 2026. ®
[2]
FuriosaAI Ditches GPU Playbook For 2nm Broadcom-Built Inference Chip, Claims HBM4/E Bandwidth Beats Even The Most Efficient GPUs
FuriosaAI and Broadcom have partnered to build a high-performance AI accelerator chip featuring next-gen HBM4/E memory. FuriosaAI's Next-Gen AI Accelerator Features 2nm Chiplet Architecture, HBM4/E Memory Support for Massive AI Compute Clusters FuriosaAI has announced its third-generation AI accelerator, which builds upon its 2nd Generation RNGD platform, which is currently in mass production on TSMC's 5nm process technology. The 2nd Gen RNGD AI platform comes in the form of a 180W PCIe-based design, which aims at LLM & Agentic AI workloads. The next-generation design is going to go all-in on the AI inference segment as Agentic AI continues to see huge demand. The third-generation AI accelerator from FuriosaAI has the following highlights: * The platform pairs 2nm compute technology with HBM4/4E memory, designed to enable high-bandwidth, rack-scale networking across massive AI compute clusters. * The architecture is optimized for demanding inference workloads with a focus on high-bandwidth data movement that delivers higher performance-per-watt and greater token density than even the most efficient GPUs. * It builds on Furiosa's current-generation RNGD chip, now in mass production. Customers include Samsung SDS and LG AI Research. Starting with some of the details shared by FuriosaAI, the chip platform will utilize an advanced 2nm compute die and HBM4/E memory standard. The firm is working with Broadcom to harness advanced packaging capabilities, allowing them to integrate multiple silicon dies into a singular & performant AI chip (System-on-chip). In the teaser shot, the company shows the 3rd Gen AI chip with 12 HBM4/E memory sites, two massive compute chiplets (2nm), and two IO controllers. That rounds up to 432 GB if Furiosa uses 12-Hi 36 GB per stack memory modules. Besides the compute architecture, FuriosaAI will also leverage Broadcom's Ethernet and PCIe IPs, allowing higher bandwidth, rack-scale networking across massive AI compute clusters. The AI chip is optimized for demanding real-world AI workloads such as post-training sampling, and high bandwidth is a key focus & that's why the company is going with the latest HBM4/E standards. The company claims that its focus on bandwidth rather than thread management (required by GPUs) will help it deliver higher efficiency and higher token throughput than modern GPU designs. Furthermore, the company is saying that its software stack allows developers to deploy new AI models quickly while meeting throughput and latency requirements. Furiosa's SDK leverages a general compiler that automatically maps high-level PyTorch code to silicon. For developers requiring more granular control, Furiosa's Virtual ISA offers a declarative programming model that provides hardware control without the nondeterministic complexity of traditional GPU programming. "Bringing together Broadcom's infrastructure capabilities and Furiosa's Tensor Contraction Processor architecture and its industry-defining software stack allows us to move beyond the chip level and deliver a comprehensive solution for the token factory era," said Furiosa Cofounder and CEO June Paik. As for availability, the 3rd Gen FuriosaAI accelerator is expected to begin sampling by the first half of 2028 and will be ready to meet compute requirements for next-gen AI data centers. Follow Wccftech on Google to get more of our news coverage in your feeds.
Share
Copy Link
Broadcom has partnered with South Korean startup FuriosaAI to develop a third-generation AI accelerator chip built on a 2nm process with HBM4/HBM4e memory. The custom AI silicon targets high-volume AI inference workloads and leverages Broadcom's advanced packaging technology to integrate multiple chiplets into a single system-on-package, with sampling expected in early 2028.
Broadcom has added South Korean startup FuriosaAI to its growing roster of partners developing custom AI silicon, marking another significant expansion of the chip giant's presence in the AI accelerator market. The collaboration will adapt FuriosaAI's Tensor Contraction Processor technology into a multi-die system-on-package specifically designed for high-volume AI inference workloads that dominate modern data centers
1
.The partnership reflects Broadcom's emergence from behind-the-scenes work to become a visible force in AI infrastructure. Custom accelerator IP has become substantial business for Broadcom, accounting for 65 percent of total revenue during the company's first quarter of 2026
1
. The company has previously kept its role in designing large pieces of custom ASIC technology for hyperscalers closely guarded, but recent announcements with Meta and Google have brought this work into public view.
Source: Wccftech
FuriosaAI's third-generation AI accelerator chip will be fabricated on a 2nm process and utilize "dual layer" HBM4/HBM4e memory, enabled by Broadcom's advanced chip packaging technology
1
. Teaser images reveal the design incorporates 12 HBM4/HBM4e memory sites, two massive compute chiplets built on the 2nm process, and two IO controllers—potentially delivering 432 GB of memory capacity if the company uses 12-Hi 36 GB per stack memory modules2
.The architecture builds upon FuriosaAI's current second-generation RNGD platform, which entered mass production on TSMC's 5nm process technology. The RNGD chips, pronounced "renegade," offer 512 teraFLOPS of dense FP8 compute, 48 GB of HBM3 across two stacks, and 1.5 TB/s of memory bandwidth at just 180 watts of power
1
. This efficiency-focused approach has won customers including LG, which runs its Exaone family of models on RNGD, and Samsung SDS2
.
Source: The Register
Broadcom's Extreme Dimension System in Package (3.5D XDSiP) technology will serve as the foundation for FuriosaAI's new chip. This approach disaggregates core compute, memory, I/O, and low-level logic into distinct chiplets, then assembles them using 3D packaging techniques like hybrid bonding into a single logical chip
1
. The technology simplifies bringing complex multi-die systems-on-package to market, allowing chip designers to focus on core logic functions while reducing time, capital, and risk.FuriosaAI's third-generation chips will leverage Broadcom's Ethernet and PCIe products to support systems exceeding eight chips—the limit of its current lineup. This implies deployment of high-radix Ethernet switches, potentially Broadcom's Tomahawk 6, for either conventional scale-out networks or dense scale-up networks
1
. The platform pairs 2nm compute technology with HBM4/4E memory to enable high-bandwidth, rack-scale networking across massive AI compute clusters2
.Related Stories
FuriosaAI claims its architecture optimized for demanding inference workloads delivers higher performance-per-watt and greater token density than even the most efficient GPUs. The company's focus on high bandwidth data movement rather than thread management required by GPUs aims to provide superior efficiency and token throughput for real-world AI workloads such as post-training sampling
2
."Bringing together Broadcom's infrastructure capabilities and Furiosa's Tensor Contraction Processor architecture and its industry-defining software stack allows us to move beyond the chip level and deliver a comprehensive solution for the token factory era," said Furiosa Cofounder and CEO June Paik
2
. The company's software stack allows developers to deploy new AI models quickly while meeting throughput and latency requirements, with its SDK leveraging a general compiler that automatically maps high-level PyTorch code to silicon2
.The third-generation FuriosaAI AI accelerator is expected to begin sampling by the first half of 2028 and will be ready to meet compute requirements for next-generation AI data centers
2
. This timeline positions the chip to compete in an increasingly crowded market as hyperscalers and enterprises seek alternatives to dominant GPU providers.Broadcom's public partnerships now extend beyond FuriosaAI. Earlier this year, Meta revealed four new AI accelerators under its MTIA portfolio designed with Broadcom's help, with the MTIA 500 promising 30 petaFLOPS of MXFP4 performance, between 384 and 512 GB of HBM, and up to 27.6 TB/s of memory bandwidth
1
. Google has also made its relationship with Broadcom official, announcing in April that it would supply Anthropic with gigawatts of TPU capacity co-developed with Broadcom1
. These collaborations signal growing demand for custom AI silicon tailored to specific workloads rather than general-purpose GPU architectures.🟡 familiarity scores.Summarized by
Navi
05 Sept 2025•Technology

27 Dec 2024•Technology

09 Dec 2024•Technology

1
Policy and Regulation

2
Science and Research

3
Technology
