Stanford team builds first commercial 3D chip at US foundry with 4x AI performance gains

Reviewed byNidhi Govil

2 Sources

Share

A Stanford-led research team has manufactured the first monolithic 3D integrated circuit at a commercial US foundry, showing four-fold performance improvements over conventional flat chips. The prototype, built at SkyWater Technology, vertically stacks memory and logic using carbon nanotube transistors and could eventually deliver up to 1,000-fold gains in energy efficiency for future AI systems.

First Commercial 3D Chip Manufactured at US Foundry

A collaborative research team from Stanford University, Carnegie Mellon University, MIT, and the University of Pennsylvania has fabricated what they claim is the first monolithic 3D integrated circuit at a commercial US foundry, marking a significant milestone for domestic semiconductor development. The prototype 3D chip was manufactured at SkyWater Technology's 200mm production line, demonstrating that advanced vertical chip architectures can transition from academic labs into real-world manufacturing environments

1

. While experimental 3D chips have been built before in university cleanrooms, this represents the first time such a design has been produced in a commercial foundry setting with measurable performance advantages.

Source: Tom's Hardware

Source: Tom's Hardware

Vertical Architecture Delivers Order-of-Magnitude Speed Gains

The chip departs from conventional two-dimensional layouts by building a structure that vertically stacks memory and logic directly on top of one another in a single, continuous process. Instead of assembling multiple finished dies into a package, engineers built each device layer sequentially on the same wafer using a low-temperature process designed not to damage underlying circuitry

1

. This creates a dense network of vertical interconnects that dramatically shortens data paths between memory cells and compute units. Early hardware tests show roughly a four-fold improvement in throughput compared with a comparable 2D implementation operating at similar latency and footprint

1

. The team highlighted that in both hardware tests and simulations, the new architecture delivers order-of-magnitude speed gains over traditional flat designs

2

.

Carbon Nanotube Transistors and Resistive RAM Integration

The prototype was manufactured using a mature 90nm to 130nm process that integrates multiple advanced technologies. The stack combines conventional silicon CMOS logic with resistive RAM layers and carbon nanotube field-effect transistors, all fabricated under a thermal budget of about 415°C

1

. This low-temperature approach is critical for building true 3D structures, as it prevents damage to underlying circuit layers during sequential fabrication. The record-setting density of vertical connections and carefully interwoven mix of memory and computing units help the chip overcome data bottlenecks that have long constrained flat chip designs

2

.

Breaking Through the Memory Wall for AI Hardware

The architecture directly addresses what engineers call the "memory wall," the point at which processing speed outpaces a chip's ability to deliver data. On conventional 2D chips, components sit on a single, flat surface with limited, spread-out memory, forcing data to travel across a few long, crowded routes

2

. "By integrating memory and computation vertically, we can move a lot more information much quicker, just as the elevator banks in a high-rise let many residents travel between floors at once," explained Tathagata Srimani, assistant professor at Carnegie Mellon University and senior author of the paper

2

. This capability is particularly valuable for AI hardware, where massive data movement between memory and processors creates significant bottlenecks.

Simulations Project 1000x Improvement in Energy-Delay Product

Beyond the measured hardware results, researchers evaluated taller stacks through simulation to understand future scaling potential. Designs with additional tiers of memory and compute showed up to a twelve-fold performance improvement on AI-style workloads, including models derived from Meta's LLaMA architecture

1

. The group argues that the architecture could eventually deliver 100-fold to 1,000-fold improvements in energy-delay product, a combined metric of speed and efficiency, by continuing to scale vertical integration rather than shrinking transistors

1

. "This opens the door to a new era of chip production and semiconductor innovation," said Subhasish Mitra, principal investigator at Stanford University. "Breakthroughs like this are how we get to the 1,000-fold hardware performance improvements future AI systems will demand"

2

.

Commercial Viability and Domestic Manufacturing Implications

SkyWater executives involved in the project emphasized the significance of demonstrating that monolithic 3D architectures can be transferred into domestic manufacturing flows. "Turning a cutting-edge academic concept into something a commercial fab can build is an enormous challenge," said Mark Nelson, vice president of technology development operations at SkyWater Technology

1

. The team presented their research at the IEEE International Electron Devices Meeting (IEDM 2025) between December 6 and 10

1

. For the AI industry, this development suggests a potential path forward as traditional transistor scaling slows, offering a way to continue improving chip performance through vertical integration rather than relying solely on smaller feature sizes.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo