NVIDIA Spectrum-X

TL;DR

Spectrum-X is NVIDIA's Ethernet platform engineered specifically for AI workloads — pairing the Spectrum-4 switch ASIC (SN5600, 51.2 Tb/s, 64 x 800 GbE, sub-microsecond hop) with the BlueField-3 SuperNIC at the endpoint, jointly implementing AI-tuned RoCEv2 extensions.
Adds three behaviours that vanilla Ethernet does not have: per-flow adaptive routing (packet spraying with end-host re-ordering), AI-tuned congestion control finer than stock DCQCN, and tenant performance isolation.
32K-GPU AI cluster reference design at 800 Gb/s per port competes directly with InfiniBand XDR; achieves comparable AllReduce throughput at 60-70 % of the InfiniBand bill-of-materials cost.
Announced May 2023 alongside NVIDIA's Israel-1 supercomputer; broadly available in DGX H100/H200/B200/GB200 reference architectures from 2024 onward as the certified Ethernet path.
Software stack: Cumulus Linux 5.x on switches, NetQ for fabric telemetry, DOCA 2.7+ on the BlueField-3 endpoints — operationally identical to a standard Ethernet shop's existing skills.

Overview

Spectrum-X is NVIDIA's answer to the question of whether Ethernet can be made to behave well enough for AI training. The platform combines the Spectrum-4 ASIC (51.2 Tb/s aggregate switching capacity at 800G port speed) with the BlueField-3 SuperNIC on the endpoint side. Together they implement a set of Ethernet extensions — packet-spraying adaptive routing, AI-tuned per-flow congestion control, and performance isolation — that mimic InfiniBand's lossless behaviour while preserving Ethernet's operational familiarity, multi-vendor switch sourcing, and lower per-port cost.

The platform was introduced at NVIDIA Computex 2023 alongside the Israel-1 supercomputer, an internal NVIDIA reference build of 4,096 H100s on a pure Spectrum-X fabric. By 2024 Spectrum-X had become an officially supported alternative to InfiniBand in DGX SuperPOD reference architectures, with hyperscalers (Microsoft Azure ND H200 v5 series, Oracle Cloud OCI Supercluster), several US neoclouds (xAI Colossus, Vultr), and a growing number of sovereign / regional AI clouds standardising on it for new builds. By 2026 the platform is on its second generation of endpoint silicon (BlueField-3 in production, BlueField-4 sampling) and remains the canonical Ethernet-for-AI design.

Spectrum-X is one of the AI-Ethernet options Yobitel evaluates for next-generation NeoCloud regions, alongside InfiniBand XDR — the choice per region is driven by sovereign-skill availability (Cumulus Linux ops vs InfiniBand specialists) and tenant TCO targets. This entry helps you pick the right AI fabric for your training cluster and understand what Yobitel runs on NeoCloud, including the cost and operational differences from a Quantum-3 XDR build at the same port speed.

Specifications

Authoritative figures for the SN5600 (flagship 800G leaf/spine appliance) and the BlueField-3 SuperNIC (the endpoint). Spectrum-X also includes the SN5400 (400G variant for mixed builds) and the older Spectrum-3 SN4000 family that still ships at 200/400G for storage-tier fabrics.

Property	SN5600 (Spectrum-4)	BlueField-3 SuperNIC
Role	Switch (leaf or spine)	Endpoint NIC + DPU
Aggregate / per-port bandwidth	51.2 Tb/s; 64 x 800 GbE	Single dual-port 400 GbE (or 1 x 800 GbE)
Port count	64 x 800 GbE (or 128 x 400 GbE split)	2 ports
Connector	OSFP	QSFP-DD / OSFP
Switch latency	~600 ns hop (cut-through)	n/a
NIC PCIe	n/a	Gen5 x16
ASIC silicon process	Spectrum-4 silicon	Arm 16-core A78 + ConnectX-7 inline NIC
Form factor	2U appliance	FHHL or HHHL PCIe; OCP NIC 3.0
Switch power (typical / max)	1,000 W / 1,500 W	75-150 W
Switch OS	Cumulus Linux 5.x or SONiC	DOCA-Host 2.7+ on host kernel
AI-tuned RoCE	Packet spraying + AI ECN	End-host re-ordering, AI CC
First shipments	2023 (SN5600), 2024 (volume)	2023
Latest firmware (2026)	Cumulus 5.10+, NOS-X 1.x	BSP 4.7+

Architecture: what makes Spectrum-X different from generic Ethernet

Spectrum-X is, in essence, RoCEv2 with three AI-specific enhancements that close most of the technical gap to InfiniBand. Operators who already understand vanilla RoCEv2 (see the RoCEv2 entry) need to layer these three behaviours on top.

Collective-aware adaptive routing (packet spraying). Spectrum-4 routes RoCEv2 elephant flows (training AllReduce, AllToAll) by spraying packets across all available equal-cost paths on a per-packet basis, then relies on the BlueField-3 SuperNIC at the receiver to re-order packets back into sequence before delivering to the host's RDMA engine. The switch knows the difference between elephant flows (the training collective) and mice flows (control plane, IPMI, monitoring) via DPI of the BTH; mice continue to use static ECMP hashing for stable ordering. The result: a 4,096-GPU AllToAll that would tail-bound on a static-ECMP RoCE fabric runs near uniformly across all spine uplinks.
AI-tuned congestion control. Stock DCQCN tunes well for storage and microservices traffic; it under-responds to the bursty, synchronised, correlated nature of training collectives. The Spectrum-X variant uses BlueField-3's hardware telemetry (per-QP one-way delay, instantaneous receive buffer occupancy, CNP rate) to feed a per-flow rate-control loop that reacts in microseconds rather than milliseconds — much closer to InfiniBand's credit-based behaviour. Tunable via DOCA but the defaults are AI-workload-aware.
Tenant performance isolation. In a multi-tenant pod (multiple training jobs sharing the same fabric), noisy-neighbour effects on tail latency are the killer. Spectrum-X uses per-tenant priority queues + dedicated buffer pools on Spectrum-4 to bound how much fabric capacity any one tenant can consume, plus per-tenant ECN marking thresholds. Job A's AllReduce burst no longer adds 500 us to Job B's tail latency.

Above the silicon, the BlueField-3 SuperNIC does the heavy lifting on the endpoint: re-ordering sprayed packets, executing the AI CC algorithm in hardware, and emitting per-flow telemetry to the switch for the closed-loop tuning. A Spectrum-X fabric without BlueField-3 endpoints is just RoCEv2 with extra steps; the magic is the joint optimisation.

Form factor and physical deployment

The SN5600 is a 2U appliance with 64 OSFP cages and front-to-back airflow. Power draw at full 800G utilisation across all 64 ports is roughly 1.5 kW; rack PDU planning should assume 2 kW per leaf. The BlueField-3 SuperNIC ships in FHHL and HHHL PCIe Gen5 x16 form factors as well as OCP NIC 3.0; choice depends on the host chassis.

Cable / optic	Reach	Approx unit cost (USD, 2026)	Typical use
800G DAC passive copper, 1-2 m	1-2 m	$420-650	Intra-rack BlueField-3 to leaf
800G AOC active optical, 3-30 m	3-30 m	$2,400-4,000	Adjacent-rack runs
800G-DR8 single-mode optic	Up to 500 m	$2,800-3,600 each end	Spine uplinks, hall-to-hall
800G-FR8 single-mode optic	Up to 2 km	$4,500-6,500 each end	Campus interconnect
400G DAC (split mode)	1-3 m	$280-450	Mixed-speed access tiers
Linear pluggable optics (LPO 800G)	Up to 500 m	$2,200-2,800 each end	Power-sensitive deployments (~50% transceiver power savings)

Software ecosystem

Spectrum-X switches run NVIDIA Cumulus Linux 5.x or community SONiC — both standard Ethernet NOSes operationally familiar to any Ethernet team. Cumulus is the supported path for AI-fabric features (NetQ telemetry integration, NV-API automation); SONiC supports the silicon but Spectrum-X-specific tuning has fewer turnkey defaults.

Cumulus Linux 5.x — primary supported NOS. Configured via NV CLI (nv set / nv config apply) or NV API (REST/gRPC). Ships with AI-fabric default templates.
NetQ — fabric telemetry, flow monitoring, validation engine. Replaces what UFM does for InfiniBand; integrates with Prometheus, Grafana, ServiceNow.
DOCA 2.7+ on BlueField-3 endpoints — the host kernel module + userland that exposes the AI CC algorithm, packet-spraying re-order engine, and per-flow telemetry.
NVIDIA Air — cloud-hosted digital twin of the fabric for validation before bring-up.
Optional: NVIDIA Mission Control (NV-MC) — top-of-rack-to-job-submission management for DGX SuperPOD-class Spectrum-X clusters.
Standards-based observability: Prometheus exporters, OpenTelemetry traces, sFlow/IPFIX flow records. All available via NetQ or directly from Cumulus.

# Bring up a Spectrum-X leaf-spine pair with default AI-fabric template
# Apply on the leaf:
nv set system aaa user nvue-admin role system-admin password '...'
nv set interface swp1-64 link mtu 9216
nv set interface swp1-32 type swp-leaf-server         # downlink to BF-3 host
nv set interface swp33-64 type swp-leaf-spine         # uplink to spine
nv set qos roce mode lossless                         # turnkey AI-fabric defaults
nv set qos roce congestion-control algorithm spectrum-x-cc
nv config apply

# Apply on the spine:
nv set interface swp1-64 link mtu 9216
nv set interface swp1-64 type swp-spine-leaf
nv set qos roce mode lossless
nv set qos roce congestion-control algorithm spectrum-x-cc
nv config apply

# Verify Spectrum-X-specific behaviour
nv show qos roce
nv show platform congestion-control
netq show fabric utilisation
netq show events --severity warning

Sizing and capacity planning

Spectrum-X scales from a single-rack pod (8 hosts, 64 GPUs) to a 32,000-GPU AI factory in three tiers. The reference cluster sizes below are NVIDIA-published validated designs; intermediate sizes are linear interpolations.

Rail-optimised cabling: each BlueField-3 SuperNIC port is mapped to one of 8 rails, with spines colour-coded per rail. Identical to InfiniBand fat-tree cabling discipline.
Oversubscription: 1:1 (non-blocking) is the default for training; 2:1 spine oversubscription halves spine count for inference / batch fabrics where AllReduce isn't the bottleneck.
Power: 1.5 kW per leaf at full load, 1.0 kW for spine (less optic density). Plan 2 kW PDU headroom per leaf.
Cooling: front-to-back airflow; SN5600 fits standard cold-aisle hot-aisle racks. No liquid cooling required at 800G in 2026.
Yobitel's NeoCloud reference design for Ethernet-preferring sovereign tenants lands on the 1,024-GPU two-tier Spectrum-X build above as the standard footprint, scaling to the 4,096-GPU three-tier shape for frontier-training reservations.

Pod size (GPUs)	Topology	SN5600 leaves	SN5600 spines	Switches total	BlueField-3 NICs	Indicative fabric BOM (USD)
64 (one HGX)	Single-tier	1	0	1	8	$70-95k
256	Two-tier (8x8)	8	8	16	32	$120-180k
1,024	Two-tier (32x16)	32	16	48	128	$320-450k
4,096	Three-tier	128	64 + 32	224	512	$1.4-1.9M
8,192	Three-tier non-blocking	256	128 + 64	448	1,024	$2.7-3.5M
16,384	Three-tier	512	256 + 128	896	2,048	$5.2-6.8M
32,000	Three-tier (NV ref)	1,000+	500 + 250	1,750+	4,000	$11-14M

Cost and TCO versus InfiniBand XDR

Spectrum-X's commercial pitch is comparable AI training performance at 60-70 % of the cost of an equivalent InfiniBand XDR fabric. The figures below are indicative USD ranges for new builds in early-to-mid 2026; negotiated pricing varies meaningfully.

At the same fabric scale, Spectrum-X is consistently 30-35 % cheaper than InfiniBand XDR on bill of materials. AllReduce throughput is within ~5 % at large messages, AllToAll within ~10 % — the gap is small enough that the cost saving wins for most operators with existing Ethernet ops capability. The exception is hyperscale-frontier training (50k+ GPUs) where InfiniBand's slightly tighter tail latency still wins.

Line item	Spectrum-X 800G	InfiniBand XDR (Quantum-3) 800G	Delta
Switch (1U/2U, 64-port 800G)	$65-90k	$95-140k	-30 to -35 %
NIC per host (dual-port 400 or 1x800G)	$3,200-4,800 (BlueField-3 SuperNIC)	$3,800-5,500 (ConnectX-8 IB)	-15 to -20 %
Optics per end (800G DR8)	$2,800-3,600	$3,200-4,200	-15 to -20 %
Switch OS / mgmt	$0-2k per port (Cumulus included)	$5-8k per port-year (UFM Enterprise)	Much cheaper
Operator skill	Existing Ethernet team	InfiniBand specialist team	Variable; cluster-specific
Full 4,096-GPU fabric BOM (incl optics)	$8-11M	$13-16M	Roughly -30-35 %
Full 16,384-GPU fabric BOM	$32-42M	$50-65M	Roughly -30-35 %

Migration paths

Spectrum-X is most often deployed in one of two migration shapes: greenfield (new AI fabric, no legacy), or brownfield replacement of generic Ethernet + RoCEv2. Migration from InfiniBand to Spectrum-X is less common but increasing — driven by TCO at scale.

Brownfield from generic Ethernet: the win is roughly 25-40 % AllReduce throughput uplift at the same wire speed, plus tail-latency reduction in multi-tenant pods. Justifies the BlueField-3 rollout for most large-fabric operators.
Brownfield from InfiniBand: the win is 30-35 % TCO savings on the next fabric refresh. Run both fabrics in parallel during cut-over (12-16 weeks); pod-by-pod migration; revalidate every training job's NCCL performance before retiring the IB fabric.
Spectrum-X is forward-compatible with BlueField-4 (sampling 2026, volume 2027) and the SN6000 Spectrum-5 switch (sampling 2026) — buys an extra hardware generation of headroom.

Migration from	Effort level	Key risk	Typical timeline
Greenfield	Low	First-time AI-fabric ops learning curve	8-12 weeks design to production
Generic Ethernet + RoCEv2 (Tomahawk)	Medium	BlueField-3 NIC rollout across all hosts	12-20 weeks (NIC swaps + cluster validation)
InfiniBand NDR (Quantum-2)	High	Software re-certification, NIC + switch + cabling swap	16-24 weeks; usually pod-by-pod cutover
InfiniBand HDR (legacy)	High	Coupled refresh: HDR endpoints unsupported on Spectrum-X anyway	Treat as greenfield + decommission

Pitfalls and operational notes

Spectrum-X-specific congestion control requires BlueField-3 SuperNIC on every endpoint. ConnectX-7 endpoints still get RoCEv2 with adaptive routing, but the full AI-tuned CC loop needs BF-3.
Packet spraying assumes the receiver re-orders correctly. A misconfigured BlueField-3 (DOCA version drift) will deliver out-of-order packets to the RDMA engine, which collapses throughput silently. Pin DOCA version per cluster.
NetQ telemetry retention defaults to 7 days. For incident post-mortem capability, raise to 30-60 days and budget the storage.
Cable bend radius at 800G is tight; survey rack cabling plans before installation. Dirty optics produce slowly-growing symbol-error counters that flap a port days later.
Mixed Spectrum-4 and older Spectrum-3 in the same fabric works but caps the slower side; segregate where possible.
PFC watchdog must be enabled. Spectrum-X's congestion control reduces the need for PFC under normal load, but PFC remains the safety net; a stuck pause frame still kills a port without the watchdog.
BlueField-3 also runs DOCA services (storage offload, security, host management) — coordinate the AI-fabric DOCA versions with the storage / security teams' DOCA expectations.

Tip: When evaluating Spectrum-X versus InfiniBand, run nccl-tests AllReduce AND AllToAll at the message sizes and rank counts your real workload uses — synthetic 8 GB AllReduce often makes both look identical; a 64 MB tensor-parallel AllReduce or a 32 MB MoE AllToAll reveals the differences. Decide on real-workload numbers, not headline marketing throughput.

References

NVIDIA Spectrum-X Platform · NVIDIA
Spectrum-X Switch Series (SN5600) · NVIDIA
NVIDIA BlueField-3 SuperNIC · NVIDIA
Optimised Ethernet for AI: Spectrum-X Whitepaper · NVIDIA
NVIDIA Cumulus Linux Documentation · NVIDIA
DOCA Software Framework · NVIDIA

TL;DR

Spectrum-X is NVIDIA's Ethernet platform engineered specifically for AI workloads — pairing the Spectrum-4 switch ASIC (SN5600, 51.2 Tb/s, 64 x 800 GbE, sub-microsecond hop) with the BlueField-3 SuperNIC at the endpoint, jointly implementing AI-tuned RoCEv2 extensions.
Adds three behaviours that vanilla Ethernet does not have: per-flow adaptive routing (packet spraying with end-host re-ordering), AI-tuned congestion control finer than stock DCQCN, and tenant performance isolation.
32K-GPU AI cluster reference design at 800 Gb/s per port competes directly with InfiniBand XDR; achieves comparable AllReduce throughput at 60-70 % of the InfiniBand bill-of-materials cost.
Announced May 2023 alongside NVIDIA's Israel-1 supercomputer; broadly available in DGX H100/H200/B200/GB200 reference architectures from 2024 onward as the certified Ethernet path.
Software stack: Cumulus Linux 5.x on switches, NetQ for fabric telemetry, DOCA 2.7+ on the BlueField-3 endpoints — operationally identical to a standard Ethernet shop's existing skills.

Overview

Specifications

Property	SN5600 (Spectrum-4)	BlueField-3 SuperNIC
Role	Switch (leaf or spine)	Endpoint NIC + DPU
Aggregate / per-port bandwidth	51.2 Tb/s; 64 x 800 GbE	Single dual-port 400 GbE (or 1 x 800 GbE)
Port count	64 x 800 GbE (or 128 x 400 GbE split)	2 ports
Connector	OSFP	QSFP-DD / OSFP
Switch latency	~600 ns hop (cut-through)	n/a
NIC PCIe	n/a	Gen5 x16
ASIC silicon process	Spectrum-4 silicon	Arm 16-core A78 + ConnectX-7 inline NIC
Form factor	2U appliance	FHHL or HHHL PCIe; OCP NIC 3.0
Switch power (typical / max)	1,000 W / 1,500 W	75-150 W
Switch OS	Cumulus Linux 5.x or SONiC	DOCA-Host 2.7+ on host kernel
AI-tuned RoCE	Packet spraying + AI ECN	End-host re-ordering, AI CC
First shipments	2023 (SN5600), 2024 (volume)	2023
Latest firmware (2026)	Cumulus 5.10+, NOS-X 1.x	BSP 4.7+

Architecture: what makes Spectrum-X different from generic Ethernet

Collective-aware adaptive routing (packet spraying). Spectrum-4 routes RoCEv2 elephant flows (training AllReduce, AllToAll) by spraying packets across all available equal-cost paths on a per-packet basis, then relies on the BlueField-3 SuperNIC at the receiver to re-order packets back into sequence before delivering to the host's RDMA engine. The switch knows the difference between elephant flows (the training collective) and mice flows (control plane, IPMI, monitoring) via DPI of the BTH; mice continue to use static ECMP hashing for stable ordering. The result: a 4,096-GPU AllToAll that would tail-bound on a static-ECMP RoCE fabric runs near uniformly across all spine uplinks.
AI-tuned congestion control. Stock DCQCN tunes well for storage and microservices traffic; it under-responds to the bursty, synchronised, correlated nature of training collectives. The Spectrum-X variant uses BlueField-3's hardware telemetry (per-QP one-way delay, instantaneous receive buffer occupancy, CNP rate) to feed a per-flow rate-control loop that reacts in microseconds rather than milliseconds — much closer to InfiniBand's credit-based behaviour. Tunable via DOCA but the defaults are AI-workload-aware.
Tenant performance isolation. In a multi-tenant pod (multiple training jobs sharing the same fabric), noisy-neighbour effects on tail latency are the killer. Spectrum-X uses per-tenant priority queues + dedicated buffer pools on Spectrum-4 to bound how much fabric capacity any one tenant can consume, plus per-tenant ECN marking thresholds. Job A's AllReduce burst no longer adds 500 us to Job B's tail latency.

Form factor and physical deployment

Cable / optic	Reach	Approx unit cost (USD, 2026)	Typical use
800G DAC passive copper, 1-2 m	1-2 m	$420-650	Intra-rack BlueField-3 to leaf
800G AOC active optical, 3-30 m	3-30 m	$2,400-4,000	Adjacent-rack runs
800G-DR8 single-mode optic	Up to 500 m	$2,800-3,600 each end	Spine uplinks, hall-to-hall
800G-FR8 single-mode optic	Up to 2 km	$4,500-6,500 each end	Campus interconnect
400G DAC (split mode)	1-3 m	$280-450	Mixed-speed access tiers
Linear pluggable optics (LPO 800G)	Up to 500 m	$2,200-2,800 each end	Power-sensitive deployments (~50% transceiver power savings)

Software ecosystem

Cumulus Linux 5.x — primary supported NOS. Configured via NV CLI (nv set / nv config apply) or NV API (REST/gRPC). Ships with AI-fabric default templates.
NetQ — fabric telemetry, flow monitoring, validation engine. Replaces what UFM does for InfiniBand; integrates with Prometheus, Grafana, ServiceNow.
DOCA 2.7+ on BlueField-3 endpoints — the host kernel module + userland that exposes the AI CC algorithm, packet-spraying re-order engine, and per-flow telemetry.
NVIDIA Air — cloud-hosted digital twin of the fabric for validation before bring-up.
Optional: NVIDIA Mission Control (NV-MC) — top-of-rack-to-job-submission management for DGX SuperPOD-class Spectrum-X clusters.
Standards-based observability: Prometheus exporters, OpenTelemetry traces, sFlow/IPFIX flow records. All available via NetQ or directly from Cumulus.

# Bring up a Spectrum-X leaf-spine pair with default AI-fabric template
# Apply on the leaf:
nv set system aaa user nvue-admin role system-admin password '...'
nv set interface swp1-64 link mtu 9216
nv set interface swp1-32 type swp-leaf-server         # downlink to BF-3 host
nv set interface swp33-64 type swp-leaf-spine         # uplink to spine
nv set qos roce mode lossless                         # turnkey AI-fabric defaults
nv set qos roce congestion-control algorithm spectrum-x-cc
nv config apply

# Apply on the spine:
nv set interface swp1-64 link mtu 9216
nv set interface swp1-64 type swp-spine-leaf
nv set qos roce mode lossless
nv set qos roce congestion-control algorithm spectrum-x-cc
nv config apply

# Verify Spectrum-X-specific behaviour
nv show qos roce
nv show platform congestion-control
netq show fabric utilisation
netq show events --severity warning

Sizing and capacity planning

Rail-optimised cabling: each BlueField-3 SuperNIC port is mapped to one of 8 rails, with spines colour-coded per rail. Identical to InfiniBand fat-tree cabling discipline.
Oversubscription: 1:1 (non-blocking) is the default for training; 2:1 spine oversubscription halves spine count for inference / batch fabrics where AllReduce isn't the bottleneck.
Power: 1.5 kW per leaf at full load, 1.0 kW for spine (less optic density). Plan 2 kW PDU headroom per leaf.
Cooling: front-to-back airflow; SN5600 fits standard cold-aisle hot-aisle racks. No liquid cooling required at 800G in 2026.
Yobitel's NeoCloud reference design for Ethernet-preferring sovereign tenants lands on the 1,024-GPU two-tier Spectrum-X build above as the standard footprint, scaling to the 4,096-GPU three-tier shape for frontier-training reservations.

Pod size (GPUs)	Topology	SN5600 leaves	SN5600 spines	Switches total	BlueField-3 NICs	Indicative fabric BOM (USD)
64 (one HGX)	Single-tier	1	0	1	8	$70-95k
256	Two-tier (8x8)	8	8	16	32	$120-180k
1,024	Two-tier (32x16)	32	16	48	128	$320-450k
4,096	Three-tier	128	64 + 32	224	512	$1.4-1.9M
8,192	Three-tier non-blocking	256	128 + 64	448	1,024	$2.7-3.5M
16,384	Three-tier	512	256 + 128	896	2,048	$5.2-6.8M
32,000	Three-tier (NV ref)	1,000+	500 + 250	1,750+	4,000	$11-14M

Cost and TCO versus InfiniBand XDR

Line item	Spectrum-X 800G	InfiniBand XDR (Quantum-3) 800G	Delta
Switch (1U/2U, 64-port 800G)	$65-90k	$95-140k	-30 to -35 %
NIC per host (dual-port 400 or 1x800G)	$3,200-4,800 (BlueField-3 SuperNIC)	$3,800-5,500 (ConnectX-8 IB)	-15 to -20 %
Optics per end (800G DR8)	$2,800-3,600	$3,200-4,200	-15 to -20 %
Switch OS / mgmt	$0-2k per port (Cumulus included)	$5-8k per port-year (UFM Enterprise)	Much cheaper
Operator skill	Existing Ethernet team	InfiniBand specialist team	Variable; cluster-specific
Full 4,096-GPU fabric BOM (incl optics)	$8-11M	$13-16M	Roughly -30-35 %
Full 16,384-GPU fabric BOM	$32-42M	$50-65M	Roughly -30-35 %

Migration paths

Brownfield from generic Ethernet: the win is roughly 25-40 % AllReduce throughput uplift at the same wire speed, plus tail-latency reduction in multi-tenant pods. Justifies the BlueField-3 rollout for most large-fabric operators.
Brownfield from InfiniBand: the win is 30-35 % TCO savings on the next fabric refresh. Run both fabrics in parallel during cut-over (12-16 weeks); pod-by-pod migration; revalidate every training job's NCCL performance before retiring the IB fabric.
Spectrum-X is forward-compatible with BlueField-4 (sampling 2026, volume 2027) and the SN6000 Spectrum-5 switch (sampling 2026) — buys an extra hardware generation of headroom.

Migration from	Effort level	Key risk	Typical timeline
Greenfield	Low	First-time AI-fabric ops learning curve	8-12 weeks design to production
Generic Ethernet + RoCEv2 (Tomahawk)	Medium	BlueField-3 NIC rollout across all hosts	12-20 weeks (NIC swaps + cluster validation)
InfiniBand NDR (Quantum-2)	High	Software re-certification, NIC + switch + cabling swap	16-24 weeks; usually pod-by-pod cutover
InfiniBand HDR (legacy)	High	Coupled refresh: HDR endpoints unsupported on Spectrum-X anyway	Treat as greenfield + decommission

Pitfalls and operational notes

Spectrum-X-specific congestion control requires BlueField-3 SuperNIC on every endpoint. ConnectX-7 endpoints still get RoCEv2 with adaptive routing, but the full AI-tuned CC loop needs BF-3.
Packet spraying assumes the receiver re-orders correctly. A misconfigured BlueField-3 (DOCA version drift) will deliver out-of-order packets to the RDMA engine, which collapses throughput silently. Pin DOCA version per cluster.
NetQ telemetry retention defaults to 7 days. For incident post-mortem capability, raise to 30-60 days and budget the storage.
Cable bend radius at 800G is tight; survey rack cabling plans before installation. Dirty optics produce slowly-growing symbol-error counters that flap a port days later.
Mixed Spectrum-4 and older Spectrum-3 in the same fabric works but caps the slower side; segregate where possible.
PFC watchdog must be enabled. Spectrum-X's congestion control reduces the need for PFC under normal load, but PFC remains the safety net; a stuck pause frame still kills a port without the watchdog.
BlueField-3 also runs DOCA services (storage offload, security, host management) — coordinate the AI-fabric DOCA versions with the storage / security teams' DOCA expectations.

Tip: When evaluating Spectrum-X versus InfiniBand, run nccl-tests AllReduce AND AllToAll at the message sizes and rank counts your real workload uses — synthetic 8 GB AllReduce often makes both look identical; a 64 MB tensor-parallel AllReduce or a 32 MB MoE AllToAll reveals the differences. Decide on real-workload numbers, not headline marketing throughput.

References

NVIDIA Spectrum-X Platform · NVIDIA
Spectrum-X Switch Series (SN5600) · NVIDIA
NVIDIA BlueField-3 SuperNIC · NVIDIA
Optimised Ethernet for AI: Spectrum-X Whitepaper · NVIDIA
NVIDIA Cumulus Linux Documentation · NVIDIA
DOCA Software Framework · NVIDIA

NVIDIA Spectrum-X

Overview

Specifications

Architecture: what makes Spectrum-X different from generic Ethernet

Form factor and physical deployment

Software ecosystem

Sizing and capacity planning

Cost and TCO versus InfiniBand XDR

Migration paths

Pitfalls and operational notes

References

Browse all entries

Deploy on Yobibyte

NVIDIA Spectrum-X

Overview

Specifications

Architecture: what makes Spectrum-X different from generic Ethernet

Form factor and physical deployment

Software ecosystem

Sizing and capacity planning

Cost and TCO versus InfiniBand XDR

Migration paths

Pitfalls and operational notes

References

Browse all entries

Deploy on Yobibyte