Modern enterprises face significant infrastructure challenges as large language models (LLMs) require processing and moving massive volumes of data for both training and inference. With even the most advanced processors limited by the capabilities of their supporting infrastructure, the need for robust, high-bandwidth networking has become imperative. For organizations aiming to utilize high-performance AI workloads efficiently, a scalable, low-latency network backbone is crucial to maximizing accelerator utilization and minimizing costly, idle resources.
Cisco Nexus 9000 Series Switches for AI/ML workloads
Cisco Nexus 9000 Series Switches deliver the high-radix, low-latency switching fabric that AI/ML workloads demand. For Intel® Gaudi® 3 AI accelerator1 deployments, Cisco has validated specific Nexus 9000 switches and configurations to ensure optimal performance.
The Nexus 9364E-SG2 (Figure 1), for example, is the premier AI networking switch from Cisco, powered by the Silicon One G200 ASIC. In a compact 2RU form factor, it delivers:
- 64 dense ports of 800 GbE (or 128 x 400 GbE / 256 x 200 GbE / 512 x 100 GbE via breakouts)
- 51.2 Tbps aggregate bandwidth for non-blocking leaf-spine fabrics
- 256 MB shared on-die packet buffer, which is critical for absorbing the synchronized traffic bursts characteristic of collective operations in distributed training
- 512 high-radix architecture that reduces the number of switching tiers required, lowering latency and simplifying fabric design
- Ultra Ethernet ready: Cisco is a founding member of the Ultra Ethernet Consortium (UEC) and Nexus 9000 switches are forward-compatible with emerging UEC specifications

The Intel Gaudi 3 AI accelerator addresses the need for scalable, open AI systems. It was designed to provide state-of-the-art data center performance for AI workloads, including generative applications like LLMs, diffusion models, and multimodal models. The Intel Gaudi 3 accelerator demonstrates significant improvements over previous generations, delivering up to 4x AI compute performance for Brain Floating Point 16-bit (BF16) workloads and a 1.5x increase in memory bandwidth compared to the Intel Gaudi 2 processor.
A key differentiator is its networking infrastructure: each Intel Gaudi 3 AI accelerator integrates 24 x 200 GbE Ethernet ports, supporting large-scale system expansion with standard Ethernet protocols. This approach eliminates a reliance on proprietary networking technologies and provides 2x the networking bandwidth compared to the Intel Gaudi 2 accelerator, enabling organizations to build clusters from a few nodes to several thousand seamlessly.
An integrated solution with high performance, scalability, and openness
Cisco Nexus 9364E-SG2 switches and OSFP-800G-DR8 transceivers are certified to support Intel Gaudi 3 AI accelerators in scale-out configurations for LLM training, inference, and generative AI workloads.
Key technical highlights of the validated architecture include:
- High-speed and non-blocking connectivity: 256 x 200 Gbps interfaces on Cisco Nexus 9364E-SG2 switches allow high-speed and non-blocking network design for interconnecting Intel Gaudi 3 accelerators
- Lossless fabric: Full support for RDMA over Converged Ethernet version 2 (RoCEv2) with Priority Flow Control (PFC) prevents packet loss due to congestion, thereby improving the completion times of distributed jobs
- Simplified operations: Nexus Dashboard allows configuring Intel Gaudi 3 AI accelerators for scale-out networks using the built-in AI fabric type. It also offers templates for further customizations and a single operations platform for all networks accessing an AI cluster.
Cisco Intelligent Packet Flow to optimize AI traffic
AI workloads generate traffic patterns unlike traditional enterprise applications—massive, synchronized bursts, “elephant flows,” and continuous GPU-to-GPU communication that can overwhelm conventional networking approaches. Cisco addresses these challenges with Cisco Intelligent Packet Flow, an advanced traffic management framework built into NX-OS.
Intelligent Packet Flow incorporates multiple load balancing strategies designed for AI fabrics:
- Dynamic load balancing (flowlet-based): Real-time traffic distribution based on link utilization telemetry
- Per-packet load balancing: Packet spraying across multiple paths for maximum throughput efficiency
- Weighted Cost Multipath (WCMP): Intelligent path weighting combined with Dynamic Load Balancing (DLB) for asymmetric topologies
- Policy-based load balancing: Assigns specific traffic-handling strategies to mixed workloads based on ACLs, DHCP markings, or RoCEv2 headers, creating custom-fit efficiency for diverse needs
These capabilities work together to minimize job completion time—the critical metric that determines how quickly your AI models train and how efficiently your inference pipelines respond.
Unified operations with Nexus Dashboard
Deploying and operating AI infrastructure at scale requires visibility and other features that go far beyond traditional network monitoring. Cisco Nexus Dashboard serves as the centralized management platform for AI fabrics, providing end-to-end RoCEv2 visibility and built-in templates for AI fabric provisioning.
Key Cisco Nexus Dashboard operational capabilities include:
- Congestion analytics: Real-time congestion scoring, Priority Flow Control and Explicit Congestion Notification (PFC/ECN) statistics, and microburst detection
- Anomaly detection: Proactive identification of performance bottlenecks with suggested remediation
- AI job observability: End-to-end visibility into AI workloads from network to GPUs
- Sustainability insights: Energy consumption monitoring and optimization recommendations
“AI at scale demands both compute efficiency and high-performance AI networking fabric. Intel® Gaudi® 3 AI accelerator combined with Cisco Nexus 9000 switching delivers an optimized, open solution that lets customers build at scale LLM inference clusters with uncompromising cost-efficient performance.”
—Anil Nanduri, VP, AI Get-to-Market & Product Management, Intel
A scalable, compliant, future-ready infrastructure
Cisco Nexus 9000 switches paired with Intel Gaudi 3 AI accelerators provide enterprises with a secure, open, and future-ready network and compute environment. This combination of technologies enables organizations to deploy scalable, high-performance AI clusters that meet both current and emerging workload requirements.
For more information or to evaluate how this reference architecture can be tailored to your organization’s needs, see specifications for Cisco Nexus 9300 Series Switches and Intel Gaudi 3 AI accelerators.
Additional resources:
1 Intel, the Intel logo, and Gaudi are trademarks of Intel Corporation or its subsidiaries.

