HGX H100 Systems - Designed for Largest AI-fused HPC Clusters
Benefits & Advantages
- Double-precision Tensor Cores delivering up to 535/268 teraFLOPS at FP64 in the 8-GPU/4-GPU respectively
- TF32 precision to reach nearly 8000 teraFLOPs for single-precision matrixmultiplication
- Superior thermal design and liquid cooling option supports maximum power/perfomance CPUs and GPUs
- Dedicated networking and storage per GPU with up to double the NVIDIA GPUDirect throughput of the previous generation
- 4 or 8 H100 SXM GPUs with NVLink, interconnect with up to 900GB/s
- Dual 4th Gen Intel Xeon Scalable processors
- Supports PCIe 5.0, DDR5, and Compute Express Link (CXL) 1.1+
- Optimized thermal capacity and airflow to support CPUs up to 350W and GPUs up to 700W with air cooling and optional liquid cooling
- PCIe 5.0 x16 1:1 networking slots for GPUs up to 400 Gbps each supporting GPUDirect Storage and RDMA, and up to 16 U.2 NVMe drive bays, high throughput data pipeline and clustering
Accelerate Large Scale AI Training Workloads
Large-Scale AI training demands cutting-edge technologies to maximize parallel computing power of GPUs to handle billions if not trillions of AI model parameters to be trained with massive datasets that are exponentially growing.
Leverage NVIDIA’s HGX™ H100 SXM 4-GPU and the fastest NVLink™ & NVSwitch™ GPU-GPU interconnects with up to 900GB/s bandwidth, and fastest 1:1 networking to each GPU for node clustering, these systems are optimized to train large language models from scratch in the shortest amount of time.
Deliver optimized systems for the most demanding AI, Cloud, and 5G Edge workloads
Enhanced thermal capacity to support the highest performing CPUs and GPUs, plus support for the latest industry technologies including PCIe 5.0, DDR5, CXL 1.1 and high-bandwidth memory.
Systems designed for optimal airflow to run in high-temperature data center environments up to 40°C, optional rack-scale liquid cooling solutions and in-house design of Titanium Level power supplies for maximum efficiency.
Improved Security and Manageability
Industry standard compliance for hardware and silicon Root of Trust (RoT), cryptographical attestation of components throughout the entire supply chain and comprehensive remote management capabilities.
Supports Open Industry Standards
Futureproofing and interoperability with support for Open Compute Project (OCP) standards including OCP 3.0, OAM, ORV2 and OSF as well as Open BMC and the E1.S storage form factor.