AMD launches Instinct MI350P PCIe AI accelerator with 144GB HBM3E to challenge Nvidia's dominance

7 Sources

Share

AMD unveiled the Instinct MI350P, a PCIe AI accelerator card with 144GB of HBM3E memory designed for air-cooled enterprise servers. The dual-slot card delivers up to 4,600 TFLOPS performance and outpaces Nvidia's H200 NVL by roughly 40% in FP16 and FP8 theoretical compute. With support for up to eight cards per system, AMD targets enterprises seeking scalable AI infrastructure without major platform overhauls.

AMD Instinct MI350P Brings High-Performance AI to Standard Servers

AMD has introduced the Instinct MI350P, marking the company's first PCIe-based AI accelerator since the MI210 debuted in 2022

1

. The new PCIe AI accelerator card addresses a critical gap in AMD's portfolio by offering enterprises a drop-in upgrade path for existing infrastructure without requiring specialized accelerator platforms

3

. This launch positions AMD to compete directly against Nvidia's H200 NVL competitor in the enterprise AI market, particularly for organizations exploring on-premise AI deployments.

Source: Wccftech

Source: Wccftech

Technical Specifications Define Enterprise-Ready Performance

The Instinct MI350P features 128 compute units with 8,192 stream processors and 512 Matrix cores, built on the CDNA4 architecture using TSMC's 3nm and 6nm FinFET process . The dual-slot card measures 10.5 inches and integrates 144GB HBM3E memory with 4TB/s of memory bandwidth across an 8192-bit bus

1

. AMD rates the card at a 600W TBP, though power capping allows operation at 450W for more power-constrained environments

2

. The passively cooled design relies on chassis fans in air-cooled enterprise AI servers, making it compatible with standard 19-inch rack-mounted configurations

3

.

Source: Guru3D

Source: Guru3D

Performance Advantages Over Nvidia's Current PCIe Offerings

AMD claims the MI350P delivers estimated performance of 2,299 TFLOPS, with peak MXFP4 performance reaching 4,600 TFLOPS—the highest currently available in an enterprise PCIe card

4

. Compared to Nvidia's H200 NVL, the card demonstrates approximately 20% better FP64, 43% better FP16, and 39% better FP8 theoretical compute performance

1

. The AI accelerator supports native MXFP6 and MXFP4 lower-precision formats to accelerate Large Language Models (LLMs), along with FP8, MXFP8, INT8, and BF16 precision formats

4

. AMD also promotes the card as capable of handling 200 to 250 billion parameter large language models per GPU

2

.

Deployment Flexibility for AI Inference Workloads

The Instinct MI350P supports configurations ranging from one to eight cards per system, enabling data centers to scale performance based on workload requirements

1

. AMD targets the card specifically at AI inference workloads, RAG pipelines, and production AI deployments across small, medium, and large enterprise implementations

4

. However, the card relies on PCIe 5.0 for chip-to-chip communications at 128GB/s, lacking the high-speed Infinity Fabric interconnects found on AMD's OAM-based MI350X and MI355X accelerators

3

. This limitation may affect performance in larger multi-GPU configurations compared to systems with dedicated accelerator interconnects.

Source: Phoronix

Source: Phoronix

Open Software Ecosystem Challenges Nvidia's CUDA Dominance

AMD positions the MI350P within its open enterprise AI software environment, supporting ROCm, Kubernetes GPU Operator, AMD Inference Microservices, and native framework support including PyTorch

4

. The company provides its enterprise AI reference stack to partners without licensing costs, contrasting with proprietary approaches

4

. However, widespread adoption remains uncertain given Nvidia's entrenched position with CUDA in the AI market

1

. AMD continues developing its ROCm software stack to improve competitiveness, though the company faces an uphill battle against established developer ecosystems.

Market Timing and Competitive Landscape

The MI350P launch arrives as Nvidia has not yet announced a PCIe version of its latest B200 Blackwell GPUs with HBM memory, temporarily giving AMD the most advanced AI accelerator in PCIe form factor

1

. Nvidia currently offers its RTX Pro 6000 Blackwell cards to enterprise customers, which sell for $8,000 to $10,000 but deliver significantly lower specifications—the MI350P provides 2.3x higher peak TFLOPS, 2.5x the memory bandwidth, and 50% more VRAM

3

. While AMD has not disclosed pricing, competitive positioning against both the H200 NVL and RTX Pro 6000 will prove critical for market penetration. Intel's upcoming Crescent Island AI accelerator with 160GB LPDDR5X memory will add another competitor later this year, though it lacks the high-bandwidth HBM3E memory of AMD's offering

2

. The MI350P is now available through AMD partners, though the timing is notable given that AMD's MI400 series is expected to launch within the year

2

.

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved