
Nvidia CEO Jensen Huang introduced right here at Computex 2023 in Taipei, Taiwan that the corporate’s Grace Hopper superchips are actually in full manufacturing, and the Grace platform has now earned six supercomputer wins. These chips are a elementary constructing block of one in all Huang’s different massive Computex 2023 bulletins: The corporate’s new DGX GH200 AI supercomputing platform, constructed for large generative AI workloads, is now accessible with 256 Grace Hopper Superchips paired collectively to kind a supercomputing powerhouse with 144TB of shared reminiscence for essentially the most demanding generative AI coaching duties. Nvidia already has prospects like Google, Meta, and Microsoft able to obtain the modern methods.
Nvidia additionally introduced its new MGX reference architectures that may assist OEMs construct new AI supercomputers quicker with as much as 100+ methods accessible. Lastly, the corporate additionally introduced its new Spectrum-X Ethernet networking platform that’s designed and optimized particularly for AI server and supercomputing clusters. Let’s dive in.
Nvidia Grace Hopper Superchips Now in Manufacturing
We have lined the Grace and Grace Hopper Superchips in depth previously. These chips are central to Nidia’s new methods that it introduced immediately. The Grace chip is Nvidia’s personal Arm CPU-only processor, and the Grace Hopper Superchip combines the Grace 72-core CPU, a Hopper GPU, 96GB of HBM3, and 512 GB of LPDDR5X on the identical package deal, all weighing in at 200 billion transistors. This mixture supplies astounding knowledge bandwidth between the CPU and GPU, with as much as 1 TB/s of throughput between the CPU and GPU providing an incredible benefit for sure memory-bound workloads.
With the Grace Hopper Superchips now in full manufacturing, we are able to anticipate methods to return from a bevy of Nidia’s methods companions, like Asus, Gigabyte, ASRock Rack, and Pegatron. Extra importantly, Nvidia is rolling out its personal methods based mostly on the brand new chips and is issuing reference design architectures for OxMs and hyperscalers, which we’ll cowl under.
Nvidia DGX GH200 Supercomputer
Nvidia’s DGX methods are its go-to system and reference structure for essentially the most demanding AI and HPC workloads, however the present DGX A100 methods are restricted to eight A100 GPUs working in tandem as one cohesive unit. Given the explosion of generative AI, Nvidia’s prospects are anticipating a lot bigger methods with rather more efficiency, and the DGX H200 is designed to supply the last word in throughput for large scalability within the largest workloads, like generative AI coaching, giant language fashions, recommender methods and knowledge analytics, by sidestepping the constraints of normal cluster connectivity choices, like InfiniBand and Ethernet, with Nvidia’s customized NVLink Change silicon.
Particulars are nonetheless slight on the finer elements of the brand new DGX GH200 AI supercomputer, however we do know that Nvidia makes use of a brand new NVLink Change System with 36 NVLink switches to tie collectively 256 GH200 Grace Hopper chips and 144 TB of shared reminiscence into one cohesive unit that appears and acts like one huge GPU. The brand new NVLink Change System is predicated on its NVLink Switch silicon that’s now in its third era.
The DGX GH200 comes with 256 whole Grace Hopper CPU+GPUs, simply outstripping Nvidia’s earlier largest NVLink-connected DGX association with eight GPUs, and the 144TB of shared reminiscence is 500X greater than the DGX A100 methods that provide a ‘mere’ 320GB of shared reminiscence between eight A100 GPUs. Moreover, increasing the DGX A100 system to clusters with greater than eight GPUs requires using InfiniBand because the interconnect between methods, which incurs efficiency penalties. In distinction, the DGX GH200 marks the primary time Nvidia has constructed a complete supercomputer cluster across the NVLink Change topology, which Nvidia says supplies as much as 10X the GPU-to-GPU and 7X the CPU-to-GPU bandwidth of its previous-gen system. It is also designed to offer 5X the interconnect energy effectivity (probably measured as PJ/bit) than competing interconnects, and as much as 128 TB/s of bisectional bandwidth.
The system has 150 miles of optical fiber and weighs 40,000 lbs, however presents itself as one single GPU. Nvidia says the 256 Grace Hopper Superchips propel the DGX GH200 to 1 exaflop of ‘AI efficiency,’ that means that worth is measured with smaller knowledge sorts which might be extra related to AI workloads than the FP64 measurements utilized in HPC and supercomputing. This efficiency comes courtesy of 900 GB/s of GPU-to-GPU bandwidth, which is kind of spectacular scalability on condition that Grace Hopper tops out at 1 TB/s of throughput with the Grace CPU when related straight collectively on the identical board with the NVLink-C2C chip interconnect.
Nvidia offered projected benchmarks of the DGX GH200 with the NVLink Change System going head-to-head with a DGX H100 cluster tied along with InfiniBand. Nvidia used various numbers of GPUs for the above workload calculations, starting from 32 to 256, however every system employed the identical variety of GPUs for every check. As you may see, the explosive good points in interconnect efficiency are anticipated to unlock anyplace from 2.2X to six.3X extra efficiency.
Nvidia will present the DGX GH200 reference blueprints to its main prospects, Google, Meta, and Microsoft, earlier than the tip of 2023, and also will present the system as a reference structure design for cloud service suppliers and hyperscalers.
Nvidia is consuming its personal dogfood, too; the corporate will deploy a brand new Nvidia Helios supercomputer comprised of 4 DGX GH200 methods that it’s going to use for its personal analysis and growth work. The 4 methods, which whole 1,024 Grace Hopper Superchips, will likely be tied along with Nvidia’s Quantum-2 InfiniBand 400 Gb/s networking.
Nvidia MGX Programs Reference Architectures
Whereas DGX steps in for the highest-end methods, Nvidia’s HGX methods step in for hyperscalers. Nevertheless, the brand new MGX methods step in as the center level between these two methods, and DGX and HGX will proceed to co-exist with the brand new MGX methods.
Nvidia’s OxM companions face new challenges with AI-centric server designs, thus slowing design and deployment. Nvidia’s new MGX reference architectures are designed to hurry that course of with 100+ reference designs. The MGX methods comprise modular designs that span the gamut of Nvidia’s portfolio of CPUs and GPUs, DPUs, and networking methods, but additionally embrace designs based mostly on the widespread x86 and Arm-based processors present in immediately’s servers. Nvidia additionally supplies choices for each air- and liquid-cooled designs, thus offering OxMs with completely different design factors for a variety of functions.
Naturally, Nvidia factors out that the lead methods from QCT and Supermicro will likely be powered by its Grace and Grace Hopper Superchips, however we anticipate that x86 flavors will in all probability have a wider array of accessible methods over time. Asus, Gigabyte, ASRock Rack and Pegatron will all use MGX reference architectures for methods that may come to market later this 12 months into early subsequent 12 months.
The MGX reference designs could possibly be the sleeper announcement of Nvidia’s Computex press blast – these would be the methods that mainstream knowledge facilities and enterprises will ultimately deploy to infuse AI-centric architectures into their deployments, and can ship in far larger numbers than the considerably unique and extra expensive DGX methods – these are the quantity movers. Nvidia continues to be finalizing the spec, which will likely be public, and can launch a whitepaper quickly.
Nvidia Spectrum-X Networking Platform
Nvidia’s buy of Mellanox has turned out to be a pivotal transfer for the corporate, as it will probably now optimize and tune networking componentry and software program for its AI-centric wants. The brand new Spectrum-X networking platform is maybe the right instance of these capabilities, as Nvidia touts it because the ‘world’s first high-performance Ethernet for AI’ networking platform.
One of many key factors right here is that Nvidia is pivoting to Ethernet as an interconnect choice for high-performance AI platforms, versus the InfiniBand connections usually present in high-performance methods. The Spectrum-X design employs Nvidia’s 51 Tb/s Spectrum-4 400 GbE Ethernet switches and the Nvidia Bluefield-3 DPUs paired with software program and SDKs that permit builders to tune methods for the distinctive wants of AI workloads. In distinction to different Ethernet based mostly methods, Nvidia says Spectrum-X is lossless, thus offering superior QoS and latency. It additionally has new adaptive routing tech, which is especially useful in multi-tenancy environments.
The Spectrum-X networking platform is a foundational side of Nvidia’s portfolio, because it brings high-performance AI cluster capabilities to Ethernet-based networking, providing new choices for wider deployments of AI into hyperscale infrastructure. The Spectrum-X platform can also be absolutely interoperable with current Ethernet-based stacks and gives spectacular scalability with as much as 256 200 Gb/s ports on a single swap, or 16,000 ports in a two-tier leaf-spine topology.
The Nvidia Spectrum-X platform and its related parts, together with 400G LinkX optics, can be found now.
Nvidia Grace and Grace Hopper Superchip Supercomputing Wins
Nvidia’s first Arm CPUs (Grace) have already been in manufacturing and made an affect with three current supercomputer wins, together with the newly introduced Taiwania 4 that will likely be constructed by computing vendor ASUS for the Taiwan Nationwide Middle for Excessive-Efficiency Computing. This method will characteristic 44 Grace CPU nodes and Nvidia claims it’s going to rank among the many most vitality environment friendly supercomputers in Asia when deployed. The supercomputer will likely be used to mannequin local weather change points.