Curated by THEOUTPOST
On Wed, 25 Sept, 12:06 AM UTC
5 Sources
[1]
Intel Launches Granite Rapids Xeon 6900P series with 120 cores -- matches AMD EPYC's core counts for the first time since 2017
Intel announced the on-time launch of its high-performance Xen 6 'Granite Rapids' 6900P-series models today, with five new models spanning from 72 cores up to 128 cores, finally exceeding the core counts of AMD's existing EPYC models for the first time since 2017, and even matching the core counts of AMD's soon-to-be-launched Zen 5 Turin processors. As a result of numerous enhancements, Intel claims the 6900P series provides up to 2.1x the performance of AMD's competing 96-core Genoa flagship in the OpenFOAM HPC workload, and up to 5.5x the AI inferencing performance in ResNet50. Intel's claims are impressive against AMD's current-gen models, but AMD has its 3nm EPYC Turin retort with up to 128 performance cores coming next month, setting the stage for a pitched battle for data center sockets throughout the remainder of 2024 and into 2025. Intel unveiled its new Xeon 6 processors at its Enterprise Tech Tour event in Portland, Oregon, last week, and it also showed its exceedingly important next-gen Clearwater Forest for the first time. The new flagship Xeon 6 data center processors come with new CPU core microarchitectures, the Intel 3 process node, and up to 504MB of L3 cache, along with support for 12 memory channels and MRDIMM memory tech that enables speeds reaching up to 8800 MT/s, all of which intel says contributes to strong gen-on-gen performance and power efficiency gains. Intel shared plenty of benchmarks to back up those claims, which we'll also cover below. Intel has split its Xeon 6 'Granite Rapids' family into several swim lanes. The Granite Rapids 6900P lineup launching today employs all performance cores (P-cores) for latency-sensitive workloads and environments that prize high single-core performance, making them a good fit for HPC, AI, virtualized environments, and general workloads. The five 6900P models range from 72 to 128 cores and stretch up to an unprecedented 500W TDP (AMD's Turin is expected to have similar TDPs). Intel's new models also have up to an incredible 504MB of L3 cache, also beating out AMD's current-gen Genoa models. Intel will launch the more general-purpose P-core Xeon 6 models with 86 or fewer cores in the first quarter of 2025 (more info below). Of course, the list of 6900P SKUs only includes 'on-roadmap' models, but Intel also works with partners to deliver custom chip designs based on their needs (AWS is a recent example). In the past, Intel has said that custom models comprise up to 50% of its Xeon sales, but the current distribution is unclear. Intel launched its Xeon 6 'Sierra Forest' 6700E series models earlier this year. These processors come armed with up to 144 efficiency cores (E-cores) for density-optimized environments that prize performance-per-watt. Intel's 6900E models with up to 288 single-threaded efficiency cores will also arrive in Q1 2025, exceeding AMD's core counts for the density-optimized Zen 5c Turin models that will come with 192 cores. However, Turin supports simultaneous multithreading (SMT), so those chips have up to 384 threads -- we'll have to see how the differences pan out in actual benchmarks. Both types of Granite Rapids processors drop into the Birch Stream platform, but Intel also splits this into two distinct branches. The Xeon 6700 E/P series will slot into standard SP server platforms, which support up to either 86 P-cores or 144 E-cores, up to 350W per CPU, eight memory channels, and up to eight sockets per server. The 6900 E/P series models require AP (Advanced Performance) server platforms that support up to 128 P-cores or 288 E-cores, up to 500W per CPU, 12 memory channels, and two sockets per server. Intel's original somewhat exotic AP platforms, which debuted with the Cascade Lake generation, didn't get much uptake due to a limited number of available system designs. Intel's executives tell us the demand for AP systems is far more robust given the current need for more performance density, and many more OEMs will bring AP platforms to market. The future Clearwater Forest chips with the 18A process will also be supported on the Birch Stream platform, providing forward compatibility for customers and OEMs. We've covered the Granite Rapids architecture multiple times in the past, but here's a short overview. Intel's 6700P series consists of designs with three compute dies (UCC designator), which are fabbed on the Intel 3 process node and house the CPU cores with the Redwood Cove microarchitecture, caches, interconnect mesh, and memory controllers. Intel isn't disclosing the number of physical CPU cores present on each die, but it's a safe assumption that each die has a minimum of one extra physical CPU core to help defray the impact of defects, thus improving yield. Four memory controllers are attached to each of the compute die, which can create latency penalties for cross-die accesses. Intel offers both a standard HEX mode that allows accessing all dies as one pool and an SNC3 mode that restricts memory and L3 cache access to each local compute die, thus avoiding the latency impact. This is similar to the traditional Sub-Numa Clustering (SNC) modes on prior-gen models; we'll put these modes to the test in our pending review. Two I/O dies are included in each chip, regardless of specs, to keep the I/O capabilities consistent across models. The I/O dies are fabbed on the Intel 7 process node and house the PCIe, UPI, and CXL controllers along with the I/O fabric. Intel is releasing the triple-compute-die UCC models today, but the XCC, HCC, and LCC models with fewer cores/die will arrive in Q1 2025. The I/O die also houses the in-built QAT, DLB, DSA, and IAA accelerators, which boost performance in compression, encryption, data movement, and data analytics workloads. These functions typically require external I/O, and using the lesser Intel 7 transistors for these functions preserves the more expensive Intel 3 transistors for compute functions. Unfortunately, Intel's accelerator blocks have a vulnerability that renders them unsafe for use in VMs. The vulnerability doesn't cause an issue if the accelerators are unused, but they must be restricted from use in VMs. Phoronix reports this is a hardware issue, meaning a software patch won't address the problem. It appears the issue won't be addressed in silicon until the Diamond Rapids and Granite Rapids D processors arrive. We're following up with Intel for clarification. As we saw with the prior-gen Emerald Rapids, Intel continues to focus on improving power efficiency when the chip is under lighter loads. This focus is because most servers typically operate anywhere from a 30% to 50% utilization, with full-load scenarios being somewhat rare in modern deployments. Through a combination of its newer process node and microarchitecture, along with refined power management optimizations that dynamically modulate multiple facets, including uncore/fabric frequencies, Intel claims up to a 1.9X improvement in performance-per-watt at 40% utilization. However, the impact varies at different load levels, though the entire CPU utilization load-line range shows a marked improvement over the prior-gen Xeon 8592+. This functionality was wrapped in an 'optimized power mode' setting with the prior-gen Xeon models, but Intel says it has now reduced the tradeoffs of this mode to the negligible range. As such, Intel's Xeon 6 now runs in this mode by default. The AP platform has 12 memory channels but only supports one DIMM per channel (1DPC). With standard memory, the platform supports up to DDR5-6400, which increases to 8800 MT/s with specialized Multiplexed Rank DIMMs (MRDIMMs). As shown on the right side of the first chart above, Intel claims MRDIMM-8800 offers up to 1.33x more performance in memory throughput-sensitive workloads, like certain AI and HPC applications, over standard DDR5-6400 memory (notably, the ResNet50 benchmark is with the 96-core Xeon 6, not the 128-core model). MRDIMMs are a JEDEC-standard memory (originally championed by AMD) that leverage multiple memory banks operating in lockstep, thus boosting performance beyond a standard memory DIMM. This type of DIMM requires hardware-based support in the memory controller, and Intel claims to be the first to market with support for this new memory tech. Intel says MRDIMMs provide the same or better latency than standard DDR5, but they'll naturally come at a cost premium. MRDIMMs are not to be confused with the performance-boosting MCRDIMMs (originally championed by Intel), which are faster but more complex and aren't officially JEDEC-ratified. Intel supported MCRDIMMs with its previous generation Xeon, but has now shifted to MRDIMMs. Memory capacity and bandwidth are becoming more pressing issues, especially for AI workloads and in-memory databases. ComputeExpress Link (CXL) is designed to help address those needs (among others). Granite Rapids supports Type 1, Type 2, and Type 3 CXL 2.0 devices, and we can expect this type of support from AMD's future platforms as well. Intel says its customers are most interested in combining economical DDR4 memory with CXL devices to boost memory capacity for DDR5-equipped servers, thus reducing cost (many simply plan to repurpose the DDR4 from their older servers). The CXL consortium initially told us of strong industry interest in this type of arrangement a few years ago, but Intel says it now foresees production-level deployments on the near horizon. Intel feels well positioned for this market due to its unique Flat Memory Mode, largely derived from its learnings with the now-dead Optane DIMMs. This feature creates one large pool of memory from both the standard direct-attached DDR5 memory DIMMs and remote Type 3 CXL memory devices with DDR4 connected via the PCIe lanes. Intel's approach is hardware-assisted - the tech is built into the memory controller and isn't software-based like other memory tiering solutions. As such, it doesn't incur any CPU overhead and operates regardless of the host operating system. Intel shared benchmarks showing the benefits of a combined memory pool of DDR4 and DDR5, with the memory controller intelligently placing data on the correct pool based on several variables. The company claims a 3% performance loss for an in-memory OLAP database, showing a minimal performance loss despite having 1/3 of the pool comprised of slower DDR4 memory. Above, you can find Intel's performance claims, and we have an album at the end of the article that includes all the benchmarking footnotes. As always with vendor-provided benchmarks, approach these results with caution. Many of Intel's comparisons are against its fifth-gen Xeon models, showing strong improvements in both performance and power efficiency across a broad range of general compute, data and web services, HPC, and AI workloads. Notably, these benchmarks employ varying core counts and memory types (DDR5, MRDIMM) for individual comparisons. Overall, Intel claims 1.2x higher average performance-per-core, 1.6X higher performance-per-watt, and 30% lower average TCO than its fifth-gen Xeon comparables. Naturally, no comparison would be complete without benchmarks against AMD's EPYC. To highlight its claimed advantages in virtualized environments, Intel also provided benchmarks against AMD's fourth-gen EPYC 'Genoa' chips, claiming up to 2.88x more performance in ResNet50 workloads in a 16 vCPU VM workload, along with advantages in a slew of other workloads like BERT-large, LAMMPS, and NGINX, among others. Intel also provided more benchmarks against both AMD's Bergamo and Genoa chips in general compute, data services, and web services. Intel also provided a range of HPC benchmarks against the EPYC 9654 with both standard DDR5 and MRDIMMs, but be sure to pay attention to the footnotes below when assessing these results. AI is soaking up much of the data center spend right now, which will likely continue for the foreseeable future. As such, Intel is keen to demonstrate its advantages in AI workloads with its Advanced Matix Extensions (AMX), which now also supports FP16 in addition to the existing support for INT8 and Bfloat16. Intel sees its AI CPU advantage taking three forms: raw AI compute on the CPU, CPU performance and support when paired with AI GPUs, and performance in vectorized databases that run on the CPU to augment AI training workloads. Intel compared its 96-core Xeon 6972P against the EPYC Genoa 9654 and prior-gen Xeon 8592+ to highlight claimed advantages across a broad range of locally-run AI workloads. Naturally, the real competitor here is AMD's upcoming Turin, but Intel doesn't have those chips available for comparison. Instead, Intel shared a slide that leveraged the Turin AI benchmarks that AMD shared during its Computex keynote. Intel wasn't happy with those benchmarks, and it doubled down on its retort with a new round of benchmarks with the new Xeon 6980P vs the 128-core Turin, claiming a 2.1x lead in Summarization, 5.4x lead with a chatbot, and 1.17x lead with a Translation workload. Naturally, we expect AMD to respond when it launches Turin next month. Intel also wants its chips to be a good pairing for not only systems equipped with Nvidia GPUs but for all servers with discrete accelerators - including its own Gaudi 3. Intel says it has multiple Xeon 6 models qualified for use with Nvidia's MGX systems, citing that as a proof point of its claimed superior CPU performance that pushes AI GPUs to their limits. Intel also claims higher single-thread CPU performance, I/O performance, higher memory bandwidth and capacity, along with support for DC-MHS and Nvidia's MGX standards, solidifies its position. Finally, Intel also touted its AMX support as an advantage in vector databases. Intel's Scalable Vector Search (SVS) library boosted indexing and search over the EPYC 9654 in the company's benchmarks. These types of databases can be used in tandem with AI RAG workloads, wherein the vector database stores the embeddings for the data set used for training. Naturally, excelling at this type of workload augments the GPUs and could help streamline the training process. Intel's Xeon 6 lineup finally brings it toe-to-toe with AMD's traditional advantage in core counts, but the true story will be told in independent benchmarking and cost analysis of the differing platforms. Notably absent in Intel's presentations? Benchmark comparisons to competing Arm server chips. Arm has steadily clawed its way into the data center, largely through custom models deployed by hyperscalers and cloud providers. That does make direct comparisons a bit tough, but we hope to see some virtualization comparisons against the Arm competition in the future. Intel's Xeon 6700P series launches today worldwide, and the follow-on models come in Q1 2025. We're busy putting a Xeon 6 server platform to the test -- stay tuned for benchmarks.
[2]
Intel Xeon 6900P "Granite Rapids P-Core" Launched: Scaling To 128 Cores, Up To 2.1x In HPC & 5.5x In AI Versus AMD EPYC, Much Faster Than 128 Core Turin "Zen 5" In AI
Intel is officially launching its next-gen Xeon 6900P "Granite Rapids" P-Core only CPUs with up to 128 cores & competitive performance against AMD EPYC. Intel Rolls Out The Big Guns With Xeon 6900P "Granite Rapids" CPUs, Equipped With Up To 128 P-Cores & Leadership In Both HPC & AI Performance Intel's Xeon lineup has been struggling to keep up with AMD's EPYC portfolio for a long while. The competition has been leading in terms of both performance and efficiency, offering a stronger and higher number of cores with a strong feature set and continued dedication towards ending Intel's domination in the segment. EPYC has made huge strides and the past Intel launches have not been received very well, to say the least. However, today, Intel is coming out guns blazing with a next-gen CPU portfolio that features brand-new core technologies, features and delivers core parity with AMD's latest and greatest. Meet the Xeon 6900P codenamed Granite Rapids. Back in June, Intel launched its first Xeon 6 family, the 6700E codenamed Sierra Forest. These CPUs feature up to 144 cores and will receive a 288-core upgrade in early 2025 with the launch of Xeon 6900E series. With the P-Core family, Intel is starting with the bigger 6900P lineup, featuring the full-fat core configurations to tackle AMD's upcoming EPYC Turin while also delivering strong uplifts versus existing parts such as Emerald Rapids and Genoa. Some of the main features of Xeon 6900P series include: Intel Xeon 6 CPU Configurations: XCC, HCC, LCC With Up To 288 E-Cores & 128 P-Cores The Intel Xeon 6900 series are chiplet-heavy design with as many as 4 chiplets for the Xeon 6900E "Sierra Forest" E-Core CPUs and as many as five chiplets for the Xeon 6900P "Granite Rapids" P-Core CPUs. The Compute Die is made on the "Intel 3" process node and features the Redwood Cove P-Cores along with the IMC while the I/O die is based on the "Intel 7" process node and has a range of IO controllers and accelerator engines. Also, while the Xeon 6700E CPUs are based on a singular die configuration, the Xeon 6700P & 6900P SKUs will come in three distinct flavors. These include an LCC die with a single compute die for up to 16 cores, the HCC die with a single yet bigger compute die with up to 48 cores and the XCC die with two compute tiles for up to 86 cores. The XCC tile for the Xeon 6900P CPUs comes in triple compute tile configurations with up to 128 cores. The CPU itself can offer up to 144 cores but it is slightly disabled due to yields. Following is how the lineup stacks up: Some of the interesting features of the modular compute die architecture have also been laid out which include: Intel Xeon 6 Platforms: LGA 7529 For High-End 1S/2S & LGA 4710 For Scalable 1S/8S Configs Then higher-end Intel Xeon 6900 "Sierra Forest" and "Granite Rapids" CPUs will feature support on the LGA 7529 socket platform (Also Birch Stream) with the reference platform known as Avenue City. This platform supports 1S/2S configurations with up to 500W TDP per CPU, 12 memory channels supporting DDR5-6400/MCR-8800 MT/s speeds, up to 96 PCIe Gen 5.0/CXL 2.0 lanes, and up to 6 UPI 2.0 links running at up to 24 GT/s speeds. Following is the maximum CPU config you get on each platform: The Intel Xeon 6900P "Granite Rapids-P" CPUs will also be the first on the market to feature support for Multiplexed Rank DIMMs or MRDIMMs. These feature up to 8800 MT/s speeds & deliver a substantial amount of uplift versus the standard DDR5-6400 configurations. You can get up to 32% improvements in performance across a range of HPC workloads and up to 33% uplifts in AI workloads. The average improvement is up to 21%. As for platforms, Intel will be shipping Xeon 6 processors as Host CPUs in a range of solutions featuring its own Gaudi 3 accelerator and also jointly working with NVIDIA for AI systems such as the MGX and HGX lineup. The lineup will scale from up to 72 cores and down to 64 cores however, customers who require higher frequency can have the Xeon 6960P switched to 48 cores, leaving more headroom for extra clock rates. Intel Xeon 6 CPU Performance: Tackling AMD EPYC Genoa & Next-Gen Turin Performance and efficiency is one area where Intel has a lot of talking points and the company has a lot of benchmark numbers showcasing its Xeon 6900P against AMD's EPYC family. The blue team is claiming up to 5.5x higher AI inferencing performance and 2.1x higher HPC performance versus AMD's EPYC family. So let's talk about performance and efficiency in detail. First up, Intel is comparing its 5th Gen Emerald Rapids CPUs against the Granite Rapids "Xeon 6900P" family. Across a range of General Compute, Data & Web Services, HPC, and AI workloads, the Xeon 6900P offers a 2.28x improvement in performance & a 60% uplift in efficiency on average. Intel also shows that at a typical 40% server utilization, the performance per watt of its Xeon 6900P can be as high as up to 90% versus the Emerald Rapids flagship. Before shifting gears to the competition numbers, Intel states the following advantages of its Xeon 6900P lineup versus Emerald Rapids: The blue team starts the comparison against AMD's EPYC and its Granite Rapids lineup with AI numbers first. We know that Granite Rapids comes with dedicated accelerator engines that can boost AI performance by a lot and here, the company is showcasing up to 5.5x AI inference perf gains over AMD EPYC Genoa (9654) CPUs. The average performance gain in AI over EPYC is 3.65x. Intel also fires back at Team Red by using their very own numbers showcased during Computex for 5th Gen EPYC Turin CPUs. Intel first shows the numbers used by AMD for its very own Emerald Rapids chips and points to the latest figures with proper hardware and software optimizations. It is showcased that under the right conditions, the Emerald Rapids chips perform way better than what the competition showed during its event, and in one instance such as Chatbot (128 input/output), the Emerald Rapids chip ends up 2% faster than the upcoming Turin chip. The Granite Rapids versus Turin numbers are a completely different story. The Xeon 6980P when compared to the 128-core EPYC Turin, yields a 34% improvement in Summarization, a 2.15x uplift in Chatbot, and an 18% uplift in Translation workloads. There's also a comparison with 96-core Xeon 6972 "Granite Rapids" and 96-core EPYC 9654 "Genoa" CPUs. In Vector Databases, Intel offers up to a 2.71x boost using its AMX instructions while Intel SVS (Scalable Vector Search) yields a massive 7.34x gain. Intel goes on to compare its latest Xeon 6980P against both AMD EPYC Genoa 96-core & Bergamo 128-core CPUs across a wide range of General Compute and data center-oriented workloads. The Xeon chips yield up to a 3.25x gain over AMD's EPYC family. Intel Xeon 6900 P-Core Granite Rapids CPU SKUs At launch, Intel is introducing five SKUs as a part of its Xeon 6900P family. These include the Xeon 6980P, 6979P, 6972P, 6952P & 6960P with 128, 120, 96, 96 & 72 cores, respectively. Let's dive into the specifications. The flagship Intel Xeon 6980P CPU will feature the full 128 P-Cores with 256 threads. This chip is going to operate at a base clock of 2.0 GHz and a boost clock of 3.9 GHz across at peak & a single core boost clock of up to 3.2 GHz. The chip will house 504 MB of L3 cache with a rated TDP of 500 Watts and offer 12 memory channels which are the same across all SKUs. The fastest chip in the CPU stack is also the most entry-tier SKU with 72 cores which has a base clock of 2.7 GHz, a boost clock of up to 3.9 GHz, and an all-core boost of up to 3.8 GHz. The only 400W TDP SKU is the Xeon 6952P with 96 cores and up to 480 MB of L3 cache. This chip is rated at a base clock of 2.1 GHz, a boost clock of 3.9 GHz, and an all-core boost clock of up to 3.2 GHz. Intel Granite Rapids "Xeon 6900P" CPU SKUs: Overall, the Intel Xeon 6900P "Granite Rapids" lineup looks like a grand return for the blue team and it will be nice to see how the chips fare in real-world tests. With the 6900P family, Intel is offering core-count parity with AMD's next-gen Turin classic CPUs and is already claiming wins in various workloads. AMD's Turin launches soon so we can expect a heated competition once again within the server space.
[3]
With Granite Rapids, Intel is back to trading blows with AMD
Over the past few years, we've grown accustomed to Xeon processors that, generation after generation, come up short of the competition on some combination of core count, clock speeds, memory bandwidth, or PCIe connectivity. With the launch of its Granite Rapids Xeons on Tuesday, Intel is finally closing the gap, and it may just be a turning point for a product line that has gained a reputation for too little, too late. The 6900P processor family represents the chipmaker's top tier of datacenter chips with up to 128 full-fat performance cores (P-cores), 256 threads, and clock speeds peaking at 3.9 GHz. That not only puts Granite Rapids at core-count parity with AMD's now year-old Bergamo platform, it makes it a direct competitor to its rival's upcoming Turin Epycs with its 128 Zen 5 cores. To be clear, Turin will actually top out at 192 cores, as CEO Lisa Su was keen to point out during her Computex keynote this spring. However, that part will use a more compact Zen 5C core which trades clocks and presumably per-core cache for compute density. It's also worth noting that Intel already delivered its Bergamo competitor this spring with Sierra Forest. That chip features 144 miniaturized efficiency cores, and a 288-core variant is due out sometime in early 2025. How comparable those cores actually are to AMD's is up for debate, as they lack both simultaneous multithreading and support for AVX-512. Granite Rapids, on the other hand, doesn't suffer those same limitations. But it's not just cores. Granite Rapids has also surpassed AMD on memory bandwidth, and while it still comes up short on I/O, the gap is shrinking. Suffice to say, across its entire Xeon 6 portfolio, Intel is once again trading blows with AMD, something that makes comparing socket-to-socket performance between the two a far less lopsided affair than it's been in years. At the heart of Granite Rapids is Intel's Redwood Cove core, which you may recall from its Meteor Lake client CPUs launched last December. On their own, these cores don't deliver much of an instructions-per-cycle (IPC) uplift compared to the Raptor Cove cores found in last year's Emerald Rapids Xeons, amounting to less than 10 percent, Intel Fellow Ronak Singhal told The Register. However, with twice as many of them and 500 watts of socket power at its disposal, Granite Rapids still manages to deliver more than twice the performance of its prior-gen Xeons, at least according to Intel's benchmarks. As always, we recommend taking any vendor-supplied benchmarks with a grain of salt. In this case, Intel is pitting both 96 and 128-core Granite Rapids SKUs with both standard DDR5 and high-performance MRDIMMs against a 64-core Emerald Rapids part. So, we're mostly looking at gen-on-gen gains at the socket level rather than per-core performance here. Granite Rapids sees the largest gains in HPC and AI applications, where the platform delivers between 2.31x and 3.08x higher performance than its predecessor. This isn't surprising considering these workloads generally benefit from larger caches and faster memory. With a substantially larger L3 cache, up to 504 MB, the move to 12 memory channels, and support for both 6,400 MT/s DDR5 and 8,800 MT/s MRDIMMS, Granite Rapids now boasts between 614 GBps and 844 GBps of memory bandwidth. For reference, AMD's Epyc 4 platform topped out at roughly 460 GBps using 4,800 MT/s DDR5. As we've previously discussed, the higher memory bandwidth afforded by MRDIMMs in particular opens the door to running small to mid-sized large language models (LLMs) on CPUs at a much higher performance than was possible on prior gens. The one trade-off that comes with these higher memory speeds, other than price, of course, is Intel only supports them for one DIMM per channel configuration. Achieving this performance comes at the expense of higher power consumption. Compared to Emerald Rapids, Intel's 6900P-series Xeon 6 processors are sucking up an extra 50-150 watts. Despite this, Intel insists its top-specced component delivers 1.9x higher performance per watt than Emerald at 40 percent utilization. As strange as that might sound, Ryan Taborah, who heads up Intel's Xeon division, argues that comparing power efficiency at 100 percent utilization simply isn't realistic outside of very specific scenarios. "Perf-per-watt really matters where our customers actually target for real-world deployments," he said. "As we talk to customers, most of them care about what is the perf-per-watt at 20 percent, 50 percent, and 80 percent... and, I'd argue, if you look at some of the competitive solutions out there, this is where Xeon shines." Compared to AMD's gen 4 Epyc Genoa platform, Granite Rapids' performance advantage depends heavily on the workload in question. In a head-to-head between Intel and AMD-powered VMs with 16 vCPUs a piece (eight cores/16 threads), Granite Rapids only manages to match Genoa in GCC integer throughput. Whereas in floating point, LAMMPS, and NGINX comparisons, Granite Rapids managed to pull ahead by anywhere from 34 to 82 percent. Meanwhile, in AI inference-centric workloads like BERT-Large and ResNet-50, Intel's 6900P Xeons pull well ahead, no doubt thanks to its AMX accelerator blocks and memory bandwidth advantage. If you're wondering why Intel opted to compare VM performance this way, it likely comes down to how cores are distributed across AMD's Epyc platform. Each of the core-complex dies on AMD's Epyc processors feature eight cores and 32 MB of L3 cache. By sizing the VM so it fits entirely within a single die, Intel is arguably presenting a best-case scenario for its competitor as it avoids the kind of cross-die latency you can run into when running larger VMs on Epyc. Speaking of chiplets, let's take a closer look at how Granite is stitched together. By now, Intel is no stranger to chiplet architectures, having shipped its first multi-die Xeons with Sapphire Rapids back in early 2023. But until recently, there's been little consistency in how those chiplets have been arranged. Sapphire Rapids used either one or four dies, while Emerald featured up to two. With the launch of Sierra Forest in spring, we saw Intel transition to a heterogeneous architecture with distinct I/O and compute dies more akin to what AMD has done since the launch of Epyc Rome in 2019. Intel has carried this formula forward with its Granite Rapids P-core Xeons. But while similar in spirit to AMD's chiplet architecture, it's by no means a clone. At least with the 6900P-series, the chips feature a pair of I/O dies (IOD) based on Intel 7 process tech located at the top and bottom edges of the package. These dies are responsible for PCIe, CXL, and UPI connectivity and also house several accelerators - DSA, IAA, QAT, and DLB to name a few - previously found on the compute die in Sapphire and Emerald. In terms of connectivity, these chips offer up to 96 lanes of PCIe 5.0 per socket as well as support for CXL 2.0. The latter presents the opportunity to inexpensively expand the memory footprint of servers well beyond what's supported by the CPU. For more on CXL, check out The Register's full breakdown here. Sandwiched between the IODs are a trio of compute dies built on the Intel 3 process node. Each of these dies feature at least 43 cores - Intel wouldn't say how many are actually on the die - and, depending on the SKU, one or more of them are fused off to achieve the desired core count. For example, on the 128-core parts, two of the dies have 43 active cores, while the third has 42. Whereas, for the 72-core part, all three compute dies have 24 cores enabled. Beyond fewer denser compute dies, the other thing that sets Intel's chiplet strategy apart from AMD is that the memory controllers are integrated directly into the compute dies rather than a singular IOD like we see on Epyc. Each of Granite's compute dies feature four DDR5/MRDIMM memory channels. In theory, this approach should mean less latency between the memory and compute, but it also means that memory bandwidth scales in proportion to the number of compute dies on board. This isn't something you'll actually need to worry about on the 6900P-series parts as they all feature the same number of dies. This won't be true of every Granite Rapids part on Intel's Xeon 6 roadmap. Its 6700P-series parts, due out early next year, will feature up to two compute dies on board sporting up to 86 cores and a maximum of eight memory channels. One thing that may come as a surprise to those who haven't deployed high-core count Sapphire or Emerald Rapids parts before is that, out of the box, each compute die is configured in SNC3 mode and appears as its own non-uniform memory access (NUMA) domain. In other words, while you see one socket, the operating system effectively sees three. Just like in a traditional multi-socket system, this is done intentionally to avoid applications accidentally getting split between NUMA domains and suffering interconnect penalties as a result. However, if you'd prefer the chip to behave like one big NUMA domain, Granite also supports what Intel is calling HEX mode, which does just that. As we mentioned before, using this mode will incur both cache and memory latency penalties. As we alluded earlier, Intel's 6900P-series chips are only just the latest in a broader portfolio of Xeon 6 processors set to trickle out over the next few quarters. For the moment, Intel's Granite Rapids lineup spans just five SKUs ranging from a frequency-tuned 6960P with 72 cores to the flagship 6980P with 128. If you're curious about Intel's current crop of E-core Xeons, which made their debut in spring, you can find our deep dive here. The remainder of Intel's Xeon 6 roadmap, including its monster 288 E-core 6900E processors and four and eight-socket-capable 6700P parts, won't arrive until early next year. Intel's 6700P series will no doubt be of interest for those running large memory-hungry databases like SAP HANA as it'll be the first generation of high-socket-count Xeons since Sapphire Rapids made its debut in early 2023. But with core counts growing by leaps and bounds generation after generation, and CXL memory offering an alternative means for achieving the memory density required for these applications, it may well be Intel's last generation to support more than two sockets. While Intel is still months away from finalizing its Xeon 6 lineup, the chipmaker is already talking up its next generation of datacenter chips. Dubbed Clearwater Forest, the part is Intel's follow-on to Sierra Forest. While we don't know much about the chip just yet, we do know that it'll be the first Intel processor built on its state-of-the-art 18A process tech. We've also learned the chip will share a similar design as Granite Rapids with three compute dies flanked by a pair of I/O dies, only smaller. Although Intel has caught up and even surpassed AMD on core count for the moment, the core wars are far from over. As we mentioned earlier, AMD is due to launch its Turin Epycs later this year with 128 Zen 5 or 192 Zen 5C cores, which have already demonstrated a 16 percent IPC uplift in a variety of workloads. What's more, unlike Intel's E-core Xeons, all of AMD's gen 5 Epycs support AVX-512. And then, of course, there's Amazon, Microsoft, and Google, which have all announced custom Arm-based silicon optimized for their workloads with up to 128 cores. Not to be outdone, Arm chip designer Ampere Computing is already working on chips with 256 and even 512 cores. These higher-core-count parts offer a number of advantages ranging from the ability to move from dual to single-socket configurations to enabling large-scale consolidation of aging nodes. However, there also remain headwinds to the adoption of these chips. For one, many software licenses are still tied to core count, which may drive customers towards mid-tier parts. Another factor is blast radius. The higher the core count, the bigger the potential impact of a failure. Lose a 32 or 64-core server, and it might take down a few workloads; take down a 512-core system, the impact will be far larger. Whether software will evolve to overcome these challenges or if chipmakers will be forced to shift focus back to scaling frequency or driving IPC gains, we'll have to wait and see. ®
[4]
Intel Xeon 6980P "Granite Rapids" Linux Benchmarks Review
With the Intel Xeon 6900P &quopt;Granite Rapids" launch today the review embargo has now expired. I began with my Intel Granite Rapids Linux benchmarking a few days ago and have initial benchmarks to share for the flagship Xeon 6980P processors paired with MRDIMM 8800MT/s memory. This is just the beginning of a lot of Granite Rapids benchmarks to come on Phoronix. Compared to the existing AMD EPYC competition and prior generation Intel Xeon processors, the Xeon 6900P series performance surpassed my expectations and has debuted as an incredibly strong performer. In some areas of HPC and other workloads, Intel is able to regain leadership performance with Granite Rapids paired with MRDIMMs. In AI workloads where the software is optimized for AMX, the new Xeon 6900P CPUs can present staggering leads. The flagship Intel Xeon 6980P features 128 cores / 256 threads, and a 2.0GHz base clock with 3.2GHz all-core turbo frequency and 3.9GHz maximum turbo frequency. The Xeon 6980P features 504MB of L3 cache and a 500 Watt TDP. As with the rest of the Xeon 6 P-core SKUs, there is support for 12 channel DDR5-6400 memory or MRDIMM 8800MT/s. For this launch testing of Intel Granite Rapids, Intel kindly supplied the review kit based on their Avenue City reference platform. This Avenue City server shipped with two Xeon 6980P processors and in my review configuration arrived with 24 DIMMs of MRDIMM 8800 MT/s memory. For those interested in DDR5-6400 vs. MRDIMM 8800 MT/s performance, that will come as a follow-up article as Intel will be sending over the DDR5-6400 memory for comparison shortly. I am still kicking the tires on this server as unfortunately due to shipping delays this server only arrived toward the end of last week. So it's been a busy several days of benchmarking. Many more benchmarks are on the way with follow-up articles. I'll be exploring areas like the SMT/HT performance on Granite Rapids, DDR5-6400 vs. MRDIMM 8800, SNC3 vs. HEX clustering modes, compiler comparisons, and a variety of other Linux benchmarks from the Xeon 6980P processors. The Intel AvenueCity reference platform for Granite Rapids does appear to be a lot more nuanced than what we're typically used to seeing from Intel reference servers. Typically Intel reference servers are of good quality and very robust for being a reference platform. With Avenue City, Intel spent time during briefings in Oregon going over various quirks and warnings: including the suggestion to not even remove the CPUs or RAM until you are done with all planned testing. Thus the lack of Xeon 6980P pictures in this article paired with the limited time ahead of launch. This was quite surprising given that Intel reference servers for years have been rather reliable and not typically prefaced with such warnings. With that said, for this article due to the time constraints and warnings, I am just looking at the Xeon 6980P performance in a dual socket (2P) configuration while as a follow-up article I'll have single CPU/socket results when getting around to removing the second CPU. Even for these 500 Watt TDP CPUs, AvenueCity is relying just on air cooling but is a 4U server design rather than the more common 2U for reference platforms. Intel AvenueCity was running OpenBMC as we've seen out of recent reference server platforms both from Intel and AMD. With the Avenuecity OpenBMC web interface it's themed nicely for Intel compared to the stock OpenBMC interface. When talking with Intel's Ryan Tabrah at the Enterprise Tech Tour, he reinforced that Intel's big server customers continue to be very interested in open-source firmware and open-source at the lower levels of the system for security and the usual benefits we've been touting for years as big open-source firmware proponents. The past few days of testing the Intel Xeon 6980P has been going well and the Avenue City server behaving. The performance of the Xeon 6980P processors has exceeded my expectations. After years of trailing AMD EPYC for leading performance, the Xeon 6980P showed that Granite Rapids is very capable and can match -- and in various cases outperform -- current AMD EPYC Genoa(X) and Bergamo processors. The generational uplift over the Intel Xeon Emerald Rapids is good to fantastic but then again we're doubling the core counts going from the Xeon Platinum 8592+ to Xeon 6980P and the big memory bandwidth upgrade with MRDIMMs. The Xeon 6980P was performing very well up against AMD's current flagship processors the EPYC 9754 128-core Bergamo, EPYC 9654 96-core Genoa, and EPYC 9684X 96-core Genoa-X processors. AMD does have the upcoming 5th Gen EPYC "Turin" processors that is reported for an October launch. It will be very interesting to see how Intel Granite Rapids can compete with those next-gen AMD EPYC Turin processors. The processors tested for this launch-day testing of Intel Granite Rapids included: - Xeon Platinum 8380 2P "Ice Lake" - Xeon Platinum 8490H 2P "Sapphire Rapids" - Xeon Max 9468 2P "Sapphire Rapids" - Xeon Max 9480 2P "Sapphire Rapids" - Xeon Platinum 8592+ 2P "Emerald Rapids" - Xeon 6766E 2P "Sierra Forest" - Xeon 6780E 2P "Sierra Forest" - Xeon 6980P 2P "Granite Rapids" - EPYC 9654 2P "Genoa" - EPYC 9684X 2P "Genoa-X" - EPYC 9754 2P "Bergamo" All tests were done on Ubuntu 24.04 LTS while using a Linux 6.10 kernel and the stock GCC 13 compiler of this Ubuntu Long Term Support release. All processors were tested with their maximum number of memory channels and at maximum rated speed. As mentioned, the Xeon 6980P were tested using MRDIMM 8800 MT/s while DDR5-6400 Granite Rapids benchmarks will come soon on Phoronix for putting it into perspective how much MRDIMMs are helping the Xeon 6900P performance. Unfortunately with the Linux kernel tested there was a RAPL/PowerCap regression leading to inaccurate CPU power readings on Granite Rapids. Due to only noticing after completing the Granite Rapids run and the limited time for testing ahead of launch, today's article will lack the customary CPU power and performance-per-Watt metrics. I'm working on the Granite Rapids CPU power efficiency metrics for a follow-up article in the coming days on Phoronix. Apologies for not having power numbers to share today due to the last minute issue and the limited amount of pre-launch testing time.
[5]
Intel Sees 'Huge' AI Opportunities For Xeon -- With And Without Nvidia
Intel explains why its newly launched Xeon 6900P processors, which scale up to 128 cores and 8,800 megatransfers per second in memory speed, are a big deal for AI computing, whether it's for CPU-based inferencing or serving as the host CPU for Nvidia-accelerated systems. Intel said its latest Xeon processors present "huge" opportunities for channel partners in the AI computing space, whether the CPUs are used for inferencing or as the head node for systems accelerated by expensive, energy-guzzling chips like Nvidia's GPUs. [Related: Intel Offered Up To $5 Billion From Apollo: Report] The Santa Clara, Calif.-based company on Tuesday marked the launch of its sixth-generation Xeon server CPUs with performance cores, or P-cores for short. Code-named Granite Rapids, the Xeon 6900P processors are designed to deliver the highest performance per core in contrast with the power efficiency focus of the higher-density Xeon 6700E chips with efficiency cores (E-cores) that debuted in June. The new Xeon processors with P-cores and E-cores represent a bifurcation in Intel's server CPU architecture that started with this year's sixth generation, known as Xeon 6, which is meant to give organizations the "flexibility to meet any organization's diverse efficiency and performance requirements," the company has previously said. With the focus on high performance, Intel is promising that the Xeon 6900P processors will provide "significant gains" in performance and performance-per-watt over the fifth-gen Xeon lineup that debuted in late 2023 across a variety of workloads, including databases, web services and high-performance computing. But the chipmaker is making an extra emphasis on the AI inference capabilities of the Xeon 6900P lineup with the belief that there is a large and growing market for businesses to use CPU servers for such purposes instead of paying for expensive accelerated systems that require substantially greater amounts of energy but may not be used all the time. This is part of Intel's two-pronged AI strategy in the data center, where the company hopes to also create demand for its Gaudi 3 accelerator chips that are set to launch in the fourth quarter this year with a focus on cost-efficient performance for generative AI workloads. Ryan Tabrah, vice president and general manager of Xeon and compute products, told CRN at an Intel press event last week that there is a "huge opportunity" for channel partners to help businesses determine whether they need accelerated systems or if they could use new or existing CPU-based systems to fulfill their AI initiatives. "Every single market on the planet is being disrupted by AI, and they're all freaking out. I hear horror stories of customers buying certain racks of AI-accelerated racks, and then they end up sitting there and they're not even using them, and they default back to the CPU. So they're really afraid of being left behind, but they also don't want to go buy a bunch of stuff they'll never be able to use and reuse," he said. The Xeon 6900P chips double the maximum core count over the previous generation to 128 cores across three compute tiles in a chiplet design, continuing Intel's break away from traditional monolithic designs in server chips that started with the fifth generation. Across the five chips in the 6900P segment, the core count goes down to 72. In the first quarter of next year, Intel plans make a wider range of core count configurations available, with P-core processors that scale up to 86 cores between two compute tiles, up to 48 cores on one compute tile and up to 16 cores on a smaller compute tile. The base and turbo frequencies are roughly in line with the previous generation, with the base frequency maxing out to 2.7GHz and the all-core turbo clock speed reaching up to 3.8GHz on the 72-core model. The single-core turbo frequency is 3.9GHz for all five chips. Compared to fifth-gen Xeon, the 6900P chips increase DDR5 memory speed by 14 percent to 6,400 mega transfers per second (MT/s) and by 57 percent to 8,800 MT/s when using new DDR5 MRDIMM modules entering the market from Micron. Intel said it's the first company to introduce CPUs that support MRDIMMs, which stands for multiplexed rank DIMMs and improves bandwidth and latency over standard memory DIMMS. The 6900P series supports six Ultra Path Interconnect 2.0 links for CPU-to-CPU transfer speeds of up to 24 GT/s, up to 96 lanes of PCIe 5.0 and CXL 2.0 connectivity, and, for the first time, the 16-bit floating point (FP16) numerical format for the processor's Advanced Matrix Extensions, which are designed to accelerate AI workloads. With the CXL 2.0 support, Intel is expanding the way data center operators can expand memory in a cost-efficient way with a new Xeon feature called flat memory mode. The mode is controlled by hardware, and it allows applications to treat memory attached via CXL, short for Compute Express Link, as if it's in the same tier as standard DRAM. This allows data center operators to lower total cost of ownership (TCO), for example, by letting them use lower-cost, lower-performance DDR4 memory in the CXL region and higher-cost, higher-performance DDR5 memory in the DRAM region, according to Ronak Singhal, senior fellow and chief architect of Xeon at Intel. "The cost of memory is a huge expense to everybody in terms of their TCO, and they're all looking at, how do I reduce their expense on that side? So the result here is that by moving from just a flat memory to this hierarchy of memory using a combination of DDR4 and DDR5, we could do that with minimal performance impact, less than 5 percent performance impact, but the customers are able to get a lower spend on their memory side by taking advantage of our platform," Singhal said in a presentation last week. Another change coming with the 6900P series is that Xeon's optimized power mode, which debuted in the fourth generation, will be turned on by default. Singhal said Intel did this because the company "made improvements to that capability such that it's no longer a choice that customers have to make between best performance and best power." The higher core counts and enhanced capabilities of the 6900P series come with higher power consumption. For all but one of the five chips, the thermal design power (TDP) is 500 watts while the outlier is 400 watts. While these are higher than the maximum 350-watt TDP for fifth-gen Xeon, Singhal said the 6900P chips can still be economical because of their significantly higher core counts. "One of the hallmarks of this new platform is to be able to go to a higher TDP, especially with some of our largest customers. They continue to drive up the power because they're able to get the higher core density, and it still makes TCO sense for them to go up there," said Singhal, who added that the additional Xeon 6 P-core processors arriving early next year will come with similar TDPs to the prior generation. Intel is hoping that the Xeon 6900P series will gain traction in the AI computing market in two ways: through adoption of CPU-only servers for inferencing and by becoming the top choice for host CPUs in servers accelerated by Nvidia GPUs, Gaudi 3 chips or the like. In the realm of CPU-based inference, Intel said the Xeon 6900P series provides a leap in performance not only over the previous generation but also AMD's fourth-gen EPYC processors, which debuted in 2022 and will soon be succeeded by a new set of CPUs, code-named "Turin" and powered by the rival's Zen 5 architecture. For example, when it came to a 7-billion-parameter Llama 2 chatbot, Intel's 96-core Xeon 6972P is over three times faster than AMD's 96-core EPYC 9654 and 128 percent faster than its 64-core Xeon 8952+ from the last generation, according to tests run by Intel. Intel also showed that the new 64-core Xeon has a greater edge over AMD's same CPU with an 8-billion-parameter Llama 3 chatbot -- four times faster -- while maintaining a similar boost with the smaller, older AI model over its last-gen Xeon part. These two tests were based on chat-based interactions with large language models (LLMs), which means that the defined output and input were limited to only 128 tokens each. Tokens typically represent words, characters or combinations of words and punctuation. When these chips were tested against the same Llama models for summarization, which involved a 1,024-token input and an 128-token output, Intel's new 96-core processor still showed major gains over the two other chips, according to the company. Intel's 96-core Xeon was also significantly faster with the BERT-large language processing model, the DLRM recommendation model and ResNet50 image classification model. Singhal said Intel performed these tests using its new 96-core Xeon CPU instead of its flagship 128-core part because the underlying frameworks of these models have been optimized for a lower core count. The company thinks its 6900P series can usurp AMD's upcoming EPYC Turin processors in CPU-based inferencing for LLMs. Based on public performance claims released by AMD earlier this year, Intel said its new 128-core Xeon 6980P is 34 percent faster for LLM summarization, 2.15 times faster for an LLM chatbot and 18 percent faster for LLM translation in comparison to its rival's 128-core flagship CPU that will arrive later this year. "Our performance cores are uniquely architected to deliver significant performance leads in critical growth spaces like AI and HPC and databases, while also delivering lower power consumption so our customers can scale without impacting their power constraints," said Tabrah, the head of Intel's Xeon business. Intel also highlighted the performance advantages of its 6900P series for vector databases, which play an important role in retrieval-augmented generation (RAG), a method for retrieving information from existing data sets using generative AI models. Thanks to Intel's Scalable Vector Search software library, the 96-core Xeon 6972P is 84 percent and 2.71 times faster when indexing databases with 100 million vectors and 45 million vectors, respectively, compared to AMD's fourth-gen, 96-core 9654 CPU, according to internal tests by the company. It also found that the 96-core Xeon is 2.12 and 7.34 times faster when searching within databases with the same respective vector counts. Singhal said this can benefit systems accelerated by GPUs running RAG applications because Intel believes vector databases for this use case mostly run on the CPU. Citing server data from research firm IDC, Intel estimates that it likely has the largest footprint of host CPUs within accelerated systems, which represent the industry's most powerful and are mostly used for training massive AI models. To help protect its CPU dominance in accelerated systems, Singhal said Intel has worked closely with Nvidia to optimize the 6900P series for the AI chip rival's MGX and HGX systems as well as upcoming systems using Intel's Gaudi 3 accelerator chips. "I think this is a great example of how we're working with the ecosystem to ensure the best overall AI system deployments," he said.
Share
Share
Copy Link
Intel's new Xeon 6900P series, based on Granite Rapids architecture, brings 120 cores to the table, matching AMD's EPYC core counts for the first time since 2017. This launch marks a significant milestone in the CPU market, with implications for AI and data center performance.
Intel has made a significant leap in the CPU market with the launch of its Xeon 6900P series, based on the Granite Rapids architecture. This new lineup, spearheaded by the Xeon 6980P, boasts up to 120 cores, marking the first time since 2017 that Intel has matched AMD's EPYC processors in core count 1.
The Xeon 6980P, the flagship model of the series, features 120 cores and 240 threads, with a base clock of 1.9 GHz and a boost clock of 3.8 GHz. It supports up to 8TB of DDR5-5600 memory across 12 channels and provides 100 PCIe 5.0 lanes 2. Initial benchmarks suggest that the 6980P outperforms its predecessor, the Xeon 8490H, by up to 2.7x in certain workloads 4.
This launch puts Intel in a stronger position to compete with AMD's EPYC processors, which have held the core count advantage for several years. The Xeon 6900P series is expected to challenge AMD's upcoming Turin processors, potentially shifting the balance in the high-performance computing and data center markets 3.
Intel sees significant opportunities for the new Xeon processors in AI applications, both with and without NVIDIA GPUs. The company is positioning its Xeon CPUs as a versatile solution for various AI workloads, from training to inference 5. This strategy aligns with the growing demand for AI-capable hardware in data centers and enterprise environments.
Despite the high core count, Intel has managed to keep the thermal design power (TDP) of the Xeon 6980P at 385W, demonstrating improvements in power efficiency. This is crucial for data centers looking to balance performance with energy consumption and cooling requirements 1.
The launch of the Granite Rapids Xeon 6900P series represents a significant milestone for Intel and the CPU industry as a whole. As the demand for high-performance computing continues to grow, especially in AI and data-intensive applications, the competition between Intel and AMD is likely to intensify, driving further innovation in the processor market 3.
Reference
[1]
[3]
Intel has announced its latest high-performance server processors, the Xeon 6900P series, featuring up to 128 cores and significant performance improvements. This launch marks a major step in Intel's efforts to compete in the data center market.
3 Sources
3 Sources
Intel cancels its Falcon Shores AI chip, delays Clearwater Forest Xeons, and struggles to maintain server CPU market share against AMD and Arm-based competitors.
5 Sources
5 Sources
Intel launches new Xeon 6 processors with performance cores, offering improved AI processing and networking solutions for data centers and edge computing.
10 Sources
10 Sources
AMD has unveiled its 5th generation EPYC 'Turin' server processors, featuring up to 192 cores, 5 GHz clock speeds, and significant performance improvements over previous generations and competitors.
4 Sources
4 Sources
AMD has revealed details about its upcoming Zen 5 architecture and Ryzen 9000 series processors, promising significant improvements in performance and efficiency. The new design lays the foundation for future CPU architectures and introduces advanced features like RDNA 3.5 iGPU and XDNA 2 NPU.
9 Sources
9 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved