Mellanox (NVIDIA) MQM9790-NS2F InfiniBand Switch in Action | Low-Latency Interconnect Optimization for RDMA/HPC/AI
May 28, 2026
As large-scale AI training clusters and high-performance computing (HPC) centers push network bandwidth and latency requirements to unprecedented levels, traditional Ethernet solutions increasingly struggle with congestion control and unpredictable tail latency under RDMA workloads. A leading national supercomputing center recently faced exactly this challenge when upgrading its next-generation GPU cluster. After evaluating multiple interconnect options, the team selected the Mellanox (NVIDIA) MQM9790-NS2F as the core fabric switch — a decision that fundamentally transformed their cluster’s performance profile.
The supercomputing center’s existing HDR InfiniBand fabric was operating near saturation. With over 2,000 GPUs running parallel AI training jobs, collective communication operations like all-reduce and all-to-all were experiencing significant tail latency spikes. The network had become the primary bottleneck, causing GPU idle time that wasted both computational resources and energy. Engineers estimated that nearly 30% of compute cycles were lost to communication overhead during large-scale distributed training runs.
What the team needed was a switch capable of delivering 400Gb/s per port, native RDMA support, and in-network computing acceleration — all while maintaining backward compatibility with existing HDR infrastructure. After reviewing the MQM9790-NS2F datasheet and MQM9790-NS2F specifications, they determined that the MQM9790-NS2F InfiniBand switch offered the ideal balance of density, performance, and feature set.
The center deployed four MQM9790-NS2F 400Gb/s NDR 64-port OSFP switches in a spine-leaf topology, interconnecting 2,048 GPUs across 64 compute nodes. Each node connects via a single OSFP-to-4x100Gb/s splitter cable, providing 400Gb/s aggregate bandwidth per server while optimizing cable management density.
| Deployment Parameter | Configuration |
|---|---|
| Switch Model | NVIDIA Mellanox MQM9790-NS2F (4 units) |
| Port Configuration | 64x OSFP, 400Gb/s NDR per port |
| Total GPUs | 2,048 (NVIDIA H100) |
| In-Network Features | SHARPv3, Adaptive Routing, Congestion Control |
Key to the deployment was ensuring full MQM9790-NS2F compatible operation with existing HDR endpoint adapters. The switch’s automatic speed negotiation and link-layer translation allowed a phased migration strategy — legacy nodes operate at HDR speeds while new NDR-capable servers leverage full 400Gb/s bandwidth. The center also utilized SHARPv3 in-network aggregation, reducing all-reduce traffic by over 65% for large message sizes commonly found in LLM training.
For those evaluating similar upgrades, MQM9790-NS2F price inquiries and MQM9790-NS2F for sale availability have increased significantly among enterprise and research customers. The switch’s competitive total cost of ownership — factoring in lower switch count due to 64-port density — makes it an attractive option for both new builds and refresh projects.
- All-reduce latency (1GB message): Reduced from 48µs to 19µs (60% improvement)
- Effective GPU utilization: Increased from 71% to 93% during large-scale training
- Job completion time (GPT-3 175B equivalent): Shortened by 41%
- Network-induced tail latency (99th percentile): Cut from 210µs to under 35µs
As an MQM9790-NS2F InfiniBand switch solution, the deployment demonstrated that 400Gb/s NDR fabrics can deliver on their theoretical promises. The combination of congestion control algorithms and adaptive routing eliminated the "incast" collapse patterns that plagued the previous HDR fabric during all-to-all communication phases.
The supercomputing center’s success with the MQM9790-NS2F has accelerated their roadmap toward exascale AI capabilities. They are now planning a second phase that will double the GPU count to 4,096 using additional MQM9790-NS2F 400Gb/s NDR 64-port OSFP switches in a three-tier fat-tree topology. The switch’s telemetry and out-of-band management features have also enabled predictive congestion avoidance, reducing operational overhead for the network team.
For network architects and IT managers evaluating next-generation fabrics, the NVIDIA Mellanox MQM9790-NS2F represents a mature, production-proven solution. Whether you are building a new AI research cluster or upgrading an existing HPC facility, this switch delivers the low-latency, high-bandwidth foundation required for modern parallel workloads.

