NVIDIA BlueField-3 DPU

TL;DR

BlueField-3 (B3220 / B3210 / B3140 SKUs) is NVIDIA's third-generation DPU: a single ASIC combining a 400 Gb/s ConnectX-7-class NIC, 16 ARM Cortex-A78 cores at 2.0 GHz, up to 32 GB DDR5, and on-chip accelerators for crypto, regex, dedupe and storage protocols.
Presents up to 2 × 200 Gb/s or 1 × 400 Gb/s as Ethernet or InfiniBand NDR; PCIe Gen5 x16 to the host, 16 × Cortex-A78 cores to the DOCA SDK, OSFP twin-port or QSFP112 connectors depending on SKU.
Offloads RoCEv2 congestion control, NVMe-oF storage initiator / target, line-rate IPsec / TLS / MACsec, and packet telemetry — keeping host CPU cycles for tenant workloads and creating a hardware-isolated trust boundary between infrastructure and tenant.
Anchors the endpoint side of NVIDIA Spectrum-X, ships standard in DGX H100 / H200 reference designs, and is the SuperNIC inside every GB200 NVL72 compute node — including the ones underneath Yobitel NeoCloud's UK and EU sovereign regions.
Street price (early-to-mid 2026) is roughly $2,000-3,000 per card depending on SKU and channel; the BOM dwarfs the equivalent dumb 400 Gb/s NIC but pays back through host CPU savings, storage offload and multi-tenant isolation.

Overview

BlueField-3 is the third generation of NVIDIA's data-processing-unit family — a programmable infrastructure platform built around a high-bandwidth NIC, a general-purpose ARM CPU complex, and a set of on-die hardware accelerators. It was announced at GTC 2021 and entered volume production in 2023, replacing BlueField-2 at the top of the SuperNIC line and remaining the workhorse DPU through 2026 alongside the newer BlueField-4.

The job description has not changed across generations: keep infrastructure work — networking, storage, security, observability — off the host CPU and provide a hardware trust boundary between tenant workloads on the host and the cloud operator's control plane on the DPU. What changed in BlueField-3 is the budget. PCIe Gen5 x16 to the host, up to 400 Gb/s of network bandwidth, 16 ARM Cortex-A78 cores at 2 GHz with up to 32 GB of DDR5, and hardware accelerators that run IPsec, TLS, MACsec, regex, dedupe and compression at line rate.

In a modern AI cluster, BlueField-3 sits behind every GPU server. NVIDIA's reference designs make it standard equipment: every DGX H100 / H200 baseboard, every GB200 NVL72 compute tray, and the canonical NVIDIA-Partner neocloud HGX template all include BlueField-3 SuperNICs. Yobitel NeoCloud follows the reference design — every H100, H200 and GB200 NVL72 node in the UK and EU regions ships with BlueField-3 SuperNICs in NIC-mode for line-rate RoCEv2 and GPUDirect RDMA, with DPU-mode enabled on the storage tier for NVMe-oF offload and on the gateway tier for multi-tenant isolation.

This entry helps you decide whether BlueField-3 is the right SuperNIC for your AI build — what each SKU does, what the DOCA software stack expects from you, when NIC mode is enough and when you actually need DPU mode, and how the card fits into the Yobitel NeoCloud architecture if you would rather consume the capability as a managed service through Yobibyte.

Specifications

Authoritative figures for the three main SKUs as shipping in 2026. The B3220 is the dual-port 200 Gb/s flagship that sits in DGX H100/H200 hosts; the B3210 is a single-port 100 Gb/s variant for storage and gateway tiers; the B3140 is the 400 Gb/s single-port SKU paired with Spectrum-X SN5600 and Quantum-3 fabrics.

Property	B3220 (dual 200G)	B3210 (single 100G)	B3140 (single 400G)
Network ports	2 × 200 Gb/s	1 × 100 Gb/s	1 × 400 Gb/s
Protocols	Ethernet (RoCEv2) + IB NDR	Ethernet + IB EDR/HDR	Ethernet + IB NDR
Connector	QSFP112 / OSFP	QSFP56	OSFP twin-port
ARM cores	16 × Cortex-A78 @ 2.0 GHz	16 × Cortex-A78 @ 2.0 GHz	16 × Cortex-A78 @ 2.0 GHz
L2 / L3 cache	8 MB shared L3	8 MB shared L3	8 MB shared L3
DRAM	16-32 GB DDR5 on-card	16 GB DDR5 on-card	16-32 GB DDR5 on-card
Host interface	PCIe Gen5 x16	PCIe Gen5 x16	PCIe Gen5 x16
Hardware accelerators	IPsec, TLS, MACsec, RegEx, dedupe, compression	Same as B3220	Same as B3220
Crypto throughput	200 Gb/s line-rate IPsec	100 Gb/s line-rate IPsec	400 Gb/s line-rate IPsec
RDMA support	RoCEv2 + IB NDR	RoCEv2 + IB HDR	RoCEv2 + IB NDR
GPUDirect RDMA	Yes (with NVIDIA GPUs)	Yes	Yes
Power (typical)	~75 W	~55 W	~150 W
Form factor	PCIe HHHL / FHHL	PCIe HHHL	PCIe FHHL / OCP 3.0
First shipped	Q1 2023	Q2 2023	Q3 2023
Process node	TSMC 7 nm	TSMC 7 nm	TSMC 7 nm
Software	DOCA 2.x + DOCA-Host	DOCA 2.x + DOCA-Host	DOCA 2.x + DOCA-Host
Boot device	eMMC + optional NVMe	eMMC + optional NVMe	eMMC + optional NVMe
Out-of-band management	BMC interface + dedicated 1 GbE	BMC interface + dedicated 1 GbE	BMC interface + dedicated 1 GbE

Architecture: what changed in BlueField-3

BlueField-3 is built around four distinct silicon blocks on one die: a ConnectX-7-class network subsystem, an ARM CPU complex, a memory subsystem, and a fixed-function accelerator complex. Each block evolved meaningfully from BlueField-2.

Network subsystem. BlueField-3 inherits ConnectX-7 silicon, which delivers up to 400 Gb/s on a single port with PAM4 signalling, hardware RoCEv2 with NVIDIA's per-flow congestion control, InfiniBand NDR support, and GPUDirect RDMA over PCIe peer-to-peer. The same packet-processing engine implements Spectrum-X's adaptive routing decisions at the NIC side, complementing the switch-side decisions made by Spectrum-4.

ARM CPU complex. Sixteen Cortex-A78 cores at 2.0 GHz with 8 MB of shared L3 — roughly an order of magnitude more compute than BlueField-2's eight Cortex-A72 cores at 2.75 GHz. The A78 cores run a full Linux distribution (Ubuntu 22.04 or RHEL 9 are the supported targets) and host containerised offload workloads, with hardware-isolation between the DPU OS and the host OS.

Memory subsystem. Up to 32 GB of on-card DDR5 — large enough to hold an entire NVMe-oF target's metadata, a TLS session cache for a high-fan-out reverse proxy, or a Suricata IDS rule set, depending on how the operator chooses to use the card.

Accelerator complex. Hardware engines for IPsec, TLS, MACsec, regex (compatible with Hyperscan), data dedupe and LZ4 compression, and the SHA-2 family. The crypto engines run at line rate — a B3140 can encrypt 400 Gb/s of IPsec without dropping the ARM cores from idle.

Form factor, power and thermal

BlueField-3 ships in three physical form factors. PCIe full-height half-length (FHHL) is the standard in DGX H100 / H200 hosts. PCIe half-height half-length (HHHL) is the option for 1U servers without front-panel access for the full-height card. OCP 3.0 is the form factor used inside GB200 NVL72 compute trays and most hyperscale designs.

Power draw varies sharply by SKU. The B3210 (100 Gb/s) draws ~55 W typical; the B3220 (2 × 200 Gb/s) draws ~75 W typical; the B3140 (400 Gb/s) draws ~150 W typical, primarily because the higher-rate SerDes and the larger DDR5 complex push the power budget. The 400 Gb/s variant typically needs active cooling — a forced-air slot near the front of the chassis or, in OCP 3.0, the host-supplied airflow.

Thermal: the ASIC operates safely up to 95 C junction temperature. Above 85 C the firmware throttles the ARM cores first, the crypto engines next, and finally the network throughput. Sustained throttling is visible through the DOCA telemetry endpoint and the mlnx-mft thermal counter — instrument both before declaring a deployment stable.

Interconnect: where BlueField-3 sits on PCIe and on the fabric

On the host side, BlueField-3 presents itself as a PCIe Gen5 x16 device. On a Sapphire Rapids or Genoa host, the card sits on the same root complex as one or more NVIDIA GPUs — and the placement matters. GPUDirect RDMA between a BlueField-3 and an H100 / H200 / B200 GPU works best when both devices share the same PCIe switch, slightly worse when they share a CPU root complex but cross a NUMA boundary, and worst when they sit under different sockets and have to traverse UPI / Infinity Fabric.

On the network side, BlueField-3 connects to a Spectrum-X (SN5600), Quantum-2 (MQM9700) or Quantum-3 leaf switch over OSFP (NDR / 400 Gb/s) or QSFP112 (200 Gb/s) cables. RoCEv2 mode requires lossless Ethernet tuning end-to-end (PFC + ECN + DCQCN) but no special operator effort on the DPU side — the per-flow congestion control runs in the ConnectX-7 silicon transparently.

BlueField-3 also exposes an out-of-band management interface. A dedicated 1 GbE link plus a BMC channel let the operator manage the DPU's ARM OS independently of the host OS — important when the DPU is enforcing security policies the host should not be able to bypass or interrupt.

Tip: For GPUDirect RDMA performance, prefer hosts where the BlueField-3 and the GPU sit under the same PCIe switch (the PEX-class chip on HGX baseboards). nvidia-smi topo -m reveals the path; aim for PIX (same switch) over PHB (host-bridge) or NODE (cross-socket).

Software ecosystem: DOCA, drivers, deployment modes

DOCA is NVIDIA's SDK and runtime for BlueField. It provides libraries and reference applications for flow programming, RDMA acceleration, telemetry export, packet processing, storage protocols, and security inspection. On the host side, DOCA-Host installs the driver stack (mlx5_core, nvidia-peermem, libibverbs, rdma-core). On the DPU side, DOCA-DPU installs the runtime, container engine and library set on the ARM OS.

BlueField-3 supports three deployment modes; the mode is chosen at deployment time and changes the host's view of the card and the operator's responsibilities.

NIC mode. The DPU behaves as a fast RDMA NIC. The ARM OS runs minimal services; the card looks like a ConnectX-7 from the host's perspective. This is the default for most AI training clusters — Yobitel NeoCloud's GPU compute tier runs the SuperNICs in NIC mode for line-rate RoCEv2 and GPUDirect RDMA, and lets the host orchestration plane handle infrastructure logic.
DPU mode. The ARM OS runs a full Linux distribution and hosts offload services — typically containerised. The host sees a NIC; the operator sees a separate addressable Linux machine on every card. Used in NeoCloud storage tier (NVMe-oF target offload) and in NeoCloud's multi-tenant gateway tier (where tenant traffic is encrypted at the DPU before it crosses the host).
Zero-trust mode. The DPU enforces policies that the host cannot see or bypass — RBAC, encrypted-tenant-traffic, hardware-attested firewall. Used by Yobibyte's multi-tenant pods for tenant-to-tenant isolation: tenants share physical hosts but their inter-host traffic flows through DPU-enforced encryption and policy that the host kernel has no path to disable.

# Verify host-side DOCA installation
mst start && mst status -v
mlxconfig -d /dev/mst/mt41692_pciconf0 query | head

# Show BlueField-3 network port state from host
ibstat | grep -E "Active|Rate|Link layer"
ethtool ens6f0np0 | grep -E "Speed|Link"

# Check DPU mode (NIC / DPU / Zero-trust)
mlxconfig -d /dev/mst/mt41692_pciconf0 query INTERNAL_CPU_MODEL
# 0 = NIC mode, 1 = DPU mode (separate host)

# Bring up DOCA service container on the DPU
ssh ubuntu@<dpu-mgmt-ip>
docker run --rm --net=host \
  --privileged \
  nvcr.io/nvidia/doca/doca_telemetry:2.7.0-doca2.7.0

Sizing and capacity planning

BlueField-3 is rarely sized in isolation — the question is how many DPUs per GPU server, which SKU per tier, and how much of the work to offload to DOCA versus leave on the host. The table below maps Yobitel NeoCloud's choices to the workload class.

For training-only fabrics, prefer NIC mode and avoid the DOCA operational tax. The DPU's value is delivered by the network silicon and GPUDirect, not the ARM cores.
For multi-tenant pods, prefer zero-trust mode and budget the operations cost. The DPU becomes a second managed machine per host; treat its OS image, firmware and DOCA release as first-class lifecycle artefacts.
DDR5 capacity matters per role. NVMe-oF targets need 32 GB; tenant gateways often run fine on 16 GB; light NIC-mode operation needs the minimum.
Power-budget check: 8 × B3140 in a single DGX-class chassis adds ~1.2 kW to the host power draw. Verify PSU sizing in the original spec sheet before doubling DPU count.

Workload tier	SKU per host	Mode	DPU role	Yobitel NeoCloud pattern
DGX H100 / H200 training	4-8 × B3220 or B3140	NIC mode	Line-rate RoCEv2 + GPUDirect RDMA	Standard NeoCloud training compute
GB200 NVL72 training	8-18 × B3140	NIC mode	400G NDR per rail, SHARPv3 + GPUDirect	NeoCloud Blackwell pods, UK & EU
Inference / mixed-tenancy	1-2 × B3220	DPU mode	Tenant isolation, TLS offload	NeoCloud inference tier; Yobibyte managed endpoints sit on this tier
NVMe-oF storage target	2 × B3220 or B3140	DPU mode	NVMe-oF target, dedupe, compression	NeoCloud parallel storage cluster
Multi-tenant gateway	2 × B3210 or B3220	Zero-trust mode	Per-tenant IPsec, hardware-attested policy	NeoCloud tenant-edge; underlies Yobibyte tenant isolation
Edge / on-prem (sovereign)	1-2 × B3210	DPU mode	Local crypto, observability, light gateway	Optional NeoCloud Edge nodes

Cost and TCO

Card prices are negotiated and depend on SKU mix, channel, support contract and volume. The figures below are indicative USD ranges for new builds in mid-2026; OEM-rebrand variants (Dell, HPE, Supermicro) sit at the higher end.

Total fabric BOM contribution: in a 1,024-GPU H100 cluster with 4 × B3220 per host (16 hosts × 8 GPUs), the DPU spend is roughly $130-180k — about 4-6 % of the GPU spend.
Yobitel NeoCloud bakes the DPU cost into the per-GPU-hour pricing; customers consuming via Yobibyte never see a separate DPU line item.
Compared with a dumb 400 Gb/s NIC at ~$1,800-2,400, the DPU premium is ~$500-800 per card. The payback is host CPU savings on RoCEv2 tuning, NVMe-oF state machines, and tenant-side encryption — easy to justify above 4 GPUs per host.

Line item	Indicative USD price	Notes
B3210 single-port 100 Gb/s	$1,400-1,900 per card	Storage / gateway tier
B3220 dual-port 200 Gb/s	$2,000-2,800 per card	DGX H100/H200 standard
B3140 single-port 400 Gb/s	$2,400-3,200 per card	Spectrum-X / Quantum-3 endpoint
NVIDIA DOCA-Host subscription	Bundled with card support	Per-card; check the OEM contract
DOCA community	Free	Pre-production / lab use
Support contract (Bronze)	~$200-350 per card/year	Updates only
Support contract (Gold/Premier)	~$500-900 per card/year	Updates + RMA + named TAM

Migration and alternatives

BlueField-3 competes with two adjacent classes of device: dumb high-speed NICs (cheaper, less capable) and other DPU families (AMD Pensando, Intel IPU, Marvell Octeon, AWS Nitro). The right choice depends on what you actually intend to offload.

Migrating from BlueField-2 to BlueField-3 is mechanical: same OCP/PCIe form factors, DOCA 2.x retains API compatibility, but the firmware lifecycle (BFB images) must be aligned across the fleet.
Migrating from a dumb NIC to BlueField-3 is straightforward in NIC mode (same mlx5_core driver) but operationally heavy in DPU mode (new ARM OS, new container runtime, new attack surface).
Yobitel NeoCloud's current standard is BlueField-3 across the H100/H200/GB200 fleet; the BlueField-4 transition begins with the GB300 NVL72 pods entering the UK region in 2026.

Alternative	Strengths	Weaknesses	Best for
ConnectX-7 NIC (dumb)	Cheapest at line rate; same network silicon as B3220	No ARM CPU, no DOCA, no offload	Pure training fabrics where DPU mode is unused
BlueField-3 (this entry)	Mature DOCA, broad ecosystem, GPUDirect	Most expensive per-port; complex operations in DPU mode	AI clusters wanting full NIC + DPU capability
BlueField-4 DPU	800 Gb/s, ~64 ARM cores, Blackwell-era SKU	Newer, smaller install base, costlier	GB300 NVL72 era fabrics, 2026+ new builds
AMD Pensando DSC2-400	Strong P4 pipeline, deployed by HPE/AMD	Smaller DOCA-equivalent ecosystem; weaker GPU integration	AMD MI300X-based clusters; HPE-standard SKUs
Intel IPU E2000 (Mount Evans)	Tight Intel Xeon integration, P4 pipeline	Limited InfiniBand; smaller AI footprint	Hyperscale builds standardised on Intel networking
AWS Nitro / Microsoft Hololake	Proven at hyperscale	Not for sale	Internal hyperscale only

Pitfalls and operational notes

Firmware drift is the silent killer. The DPU runs three coupled firmwares (NIC, ARM bootloader, BFB image); mixing versions across a fleet causes sporadic RoCEv2 connection drops and silent GPUDirect fallback. Pin a DOCA release per pod and document the upgrade window.
NIC mode versus DPU mode is a one-way migration in practice — moving from NIC mode to DPU mode after the host is in production requires a reboot, new firmware image, and re-cabling of the OOB management network.
BAR1 sizing on the host GPU affects GPUDirect registration. If BAR1 is small (default on many BIOS), large RDMA registrations from training frameworks fail with cryptic ibv_reg_mr errors. Set BAR1 to the GPU's HBM size in BIOS.
DPU-mode containers run on ARM, not x86. Building an ARM image, hardware-attesting it, and shipping it to a fleet of DPUs is a different supply chain than the host application supply chain. Treat it as such.
OOB management network: do not let DPU management share a VLAN with host workloads. The DPU is meant to be a separate trust domain; a shared management VLAN collapses the model.
PCIe ACS (Access Control Services) enabled on intermediate root-port bridges blocks GPUDirect peer-to-peer. Disable per OEM guidance; verify with lspci -vv | grep ACSCtl.
Power: 8 × B3140 per host adds ~1.2 kW; verify chassis PSU and rack PDU sizing before scale-out.
DOCA telemetry is opt-in. Enable it before the first production run, not after the first incident.

Warning: A BlueField-3 deployed in DPU mode with default settings will silently accept any container that lands on its container runtime — including via SSH from a misconfigured operator. Treat the DPU's ARM OS as a separate hardened host: signed images only, RBAC-controlled SSH, audit logging exported to a separate collector. Yobitel NeoCloud's DPU-mode tier runs hardware-attested signed images only; replicate that discipline before going to production.

Where it fits in the Yobitel stack

BlueField-3 is the SuperNIC inside every Yobitel NeoCloud H100, H200 and GB200 NVL72 node. In the training tier it runs in NIC mode, delivering line-rate RoCEv2 (or InfiniBand NDR) and GPUDirect RDMA to NCCL. In the storage tier it runs in DPU mode, hosting NVMe-oF target offload and dedupe/compression. In the multi-tenant inference and gateway tier it runs in zero-trust mode, providing hardware-attested tenant isolation that lets Yobibyte safely share physical hosts across tenants while preserving the NCSC OFFICIAL classification on the UK sovereign region.

Customers consuming Yobitel NeoCloud directly see BlueField-3's effect as low-latency, line-rate inter-node bandwidth and a low host-CPU footprint for networking. Customers consuming through Yobibyte see it as managed multi-tenant inference endpoints that share hardware without sharing trust. Customers running InferenceBench's published throughput numbers see it as the unspoken substrate that lets the benchmark hit deterministic numbers across pods. The card is invisible in the customer surface; the behaviour it enables is not.

References

NVIDIA BlueField-3 DPU Product Page · NVIDIA
BlueField-3 DPU Datasheet · NVIDIA
DOCA SDK Documentation · NVIDIA
NVIDIA Spectrum-X Reference Architecture · NVIDIA
DGX H100/H200 System Architecture · NVIDIA

TL;DR

BlueField-3 (B3220 / B3210 / B3140 SKUs) is NVIDIA's third-generation DPU: a single ASIC combining a 400 Gb/s ConnectX-7-class NIC, 16 ARM Cortex-A78 cores at 2.0 GHz, up to 32 GB DDR5, and on-chip accelerators for crypto, regex, dedupe and storage protocols.
Presents up to 2 × 200 Gb/s or 1 × 400 Gb/s as Ethernet or InfiniBand NDR; PCIe Gen5 x16 to the host, 16 × Cortex-A78 cores to the DOCA SDK, OSFP twin-port or QSFP112 connectors depending on SKU.
Offloads RoCEv2 congestion control, NVMe-oF storage initiator / target, line-rate IPsec / TLS / MACsec, and packet telemetry — keeping host CPU cycles for tenant workloads and creating a hardware-isolated trust boundary between infrastructure and tenant.
Anchors the endpoint side of NVIDIA Spectrum-X, ships standard in DGX H100 / H200 reference designs, and is the SuperNIC inside every GB200 NVL72 compute node — including the ones underneath Yobitel NeoCloud's UK and EU sovereign regions.
Street price (early-to-mid 2026) is roughly $2,000-3,000 per card depending on SKU and channel; the BOM dwarfs the equivalent dumb 400 Gb/s NIC but pays back through host CPU savings, storage offload and multi-tenant isolation.

Overview

Specifications

Property	B3220 (dual 200G)	B3210 (single 100G)	B3140 (single 400G)
Network ports	2 × 200 Gb/s	1 × 100 Gb/s	1 × 400 Gb/s
Protocols	Ethernet (RoCEv2) + IB NDR	Ethernet + IB EDR/HDR	Ethernet + IB NDR
Connector	QSFP112 / OSFP	QSFP56	OSFP twin-port
ARM cores	16 × Cortex-A78 @ 2.0 GHz	16 × Cortex-A78 @ 2.0 GHz	16 × Cortex-A78 @ 2.0 GHz
L2 / L3 cache	8 MB shared L3	8 MB shared L3	8 MB shared L3
DRAM	16-32 GB DDR5 on-card	16 GB DDR5 on-card	16-32 GB DDR5 on-card
Host interface	PCIe Gen5 x16	PCIe Gen5 x16	PCIe Gen5 x16
Hardware accelerators	IPsec, TLS, MACsec, RegEx, dedupe, compression	Same as B3220	Same as B3220
Crypto throughput	200 Gb/s line-rate IPsec	100 Gb/s line-rate IPsec	400 Gb/s line-rate IPsec
RDMA support	RoCEv2 + IB NDR	RoCEv2 + IB HDR	RoCEv2 + IB NDR
GPUDirect RDMA	Yes (with NVIDIA GPUs)	Yes	Yes
Power (typical)	~75 W	~55 W	~150 W
Form factor	PCIe HHHL / FHHL	PCIe HHHL	PCIe FHHL / OCP 3.0
First shipped	Q1 2023	Q2 2023	Q3 2023
Process node	TSMC 7 nm	TSMC 7 nm	TSMC 7 nm
Software	DOCA 2.x + DOCA-Host	DOCA 2.x + DOCA-Host	DOCA 2.x + DOCA-Host
Boot device	eMMC + optional NVMe	eMMC + optional NVMe	eMMC + optional NVMe
Out-of-band management	BMC interface + dedicated 1 GbE	BMC interface + dedicated 1 GbE	BMC interface + dedicated 1 GbE

Architecture: what changed in BlueField-3

Form factor, power and thermal

Interconnect: where BlueField-3 sits on PCIe and on the fabric

Tip: For GPUDirect RDMA performance, prefer hosts where the BlueField-3 and the GPU sit under the same PCIe switch (the PEX-class chip on HGX baseboards). nvidia-smi topo -m reveals the path; aim for PIX (same switch) over PHB (host-bridge) or NODE (cross-socket).

Software ecosystem: DOCA, drivers, deployment modes

BlueField-3 supports three deployment modes; the mode is chosen at deployment time and changes the host's view of the card and the operator's responsibilities.

NIC mode. The DPU behaves as a fast RDMA NIC. The ARM OS runs minimal services; the card looks like a ConnectX-7 from the host's perspective. This is the default for most AI training clusters — Yobitel NeoCloud's GPU compute tier runs the SuperNICs in NIC mode for line-rate RoCEv2 and GPUDirect RDMA, and lets the host orchestration plane handle infrastructure logic.
DPU mode. The ARM OS runs a full Linux distribution and hosts offload services — typically containerised. The host sees a NIC; the operator sees a separate addressable Linux machine on every card. Used in NeoCloud storage tier (NVMe-oF target offload) and in NeoCloud's multi-tenant gateway tier (where tenant traffic is encrypted at the DPU before it crosses the host).
Zero-trust mode. The DPU enforces policies that the host cannot see or bypass — RBAC, encrypted-tenant-traffic, hardware-attested firewall. Used by Yobibyte's multi-tenant pods for tenant-to-tenant isolation: tenants share physical hosts but their inter-host traffic flows through DPU-enforced encryption and policy that the host kernel has no path to disable.

# Verify host-side DOCA installation
mst start && mst status -v
mlxconfig -d /dev/mst/mt41692_pciconf0 query | head

# Show BlueField-3 network port state from host
ibstat | grep -E "Active|Rate|Link layer"
ethtool ens6f0np0 | grep -E "Speed|Link"

# Check DPU mode (NIC / DPU / Zero-trust)
mlxconfig -d /dev/mst/mt41692_pciconf0 query INTERNAL_CPU_MODEL
# 0 = NIC mode, 1 = DPU mode (separate host)

# Bring up DOCA service container on the DPU
ssh ubuntu@<dpu-mgmt-ip>
docker run --rm --net=host \
  --privileged \
  nvcr.io/nvidia/doca/doca_telemetry:2.7.0-doca2.7.0

Sizing and capacity planning

For training-only fabrics, prefer NIC mode and avoid the DOCA operational tax. The DPU's value is delivered by the network silicon and GPUDirect, not the ARM cores.
For multi-tenant pods, prefer zero-trust mode and budget the operations cost. The DPU becomes a second managed machine per host; treat its OS image, firmware and DOCA release as first-class lifecycle artefacts.
DDR5 capacity matters per role. NVMe-oF targets need 32 GB; tenant gateways often run fine on 16 GB; light NIC-mode operation needs the minimum.
Power-budget check: 8 × B3140 in a single DGX-class chassis adds ~1.2 kW to the host power draw. Verify PSU sizing in the original spec sheet before doubling DPU count.

Workload tier	SKU per host	Mode	DPU role	Yobitel NeoCloud pattern
DGX H100 / H200 training	4-8 × B3220 or B3140	NIC mode	Line-rate RoCEv2 + GPUDirect RDMA	Standard NeoCloud training compute
GB200 NVL72 training	8-18 × B3140	NIC mode	400G NDR per rail, SHARPv3 + GPUDirect	NeoCloud Blackwell pods, UK & EU
Inference / mixed-tenancy	1-2 × B3220	DPU mode	Tenant isolation, TLS offload	NeoCloud inference tier; Yobibyte managed endpoints sit on this tier
NVMe-oF storage target	2 × B3220 or B3140	DPU mode	NVMe-oF target, dedupe, compression	NeoCloud parallel storage cluster
Multi-tenant gateway	2 × B3210 or B3220	Zero-trust mode	Per-tenant IPsec, hardware-attested policy	NeoCloud tenant-edge; underlies Yobibyte tenant isolation
Edge / on-prem (sovereign)	1-2 × B3210	DPU mode	Local crypto, observability, light gateway	Optional NeoCloud Edge nodes

Cost and TCO

Total fabric BOM contribution: in a 1,024-GPU H100 cluster with 4 × B3220 per host (16 hosts × 8 GPUs), the DPU spend is roughly $130-180k — about 4-6 % of the GPU spend.
Yobitel NeoCloud bakes the DPU cost into the per-GPU-hour pricing; customers consuming via Yobibyte never see a separate DPU line item.
Compared with a dumb 400 Gb/s NIC at ~$1,800-2,400, the DPU premium is ~$500-800 per card. The payback is host CPU savings on RoCEv2 tuning, NVMe-oF state machines, and tenant-side encryption — easy to justify above 4 GPUs per host.

Line item	Indicative USD price	Notes
B3210 single-port 100 Gb/s	$1,400-1,900 per card	Storage / gateway tier
B3220 dual-port 200 Gb/s	$2,000-2,800 per card	DGX H100/H200 standard
B3140 single-port 400 Gb/s	$2,400-3,200 per card	Spectrum-X / Quantum-3 endpoint
NVIDIA DOCA-Host subscription	Bundled with card support	Per-card; check the OEM contract
DOCA community	Free	Pre-production / lab use
Support contract (Bronze)	~$200-350 per card/year	Updates only
Support contract (Gold/Premier)	~$500-900 per card/year	Updates + RMA + named TAM

Migration and alternatives

Migrating from BlueField-2 to BlueField-3 is mechanical: same OCP/PCIe form factors, DOCA 2.x retains API compatibility, but the firmware lifecycle (BFB images) must be aligned across the fleet.
Migrating from a dumb NIC to BlueField-3 is straightforward in NIC mode (same mlx5_core driver) but operationally heavy in DPU mode (new ARM OS, new container runtime, new attack surface).
Yobitel NeoCloud's current standard is BlueField-3 across the H100/H200/GB200 fleet; the BlueField-4 transition begins with the GB300 NVL72 pods entering the UK region in 2026.

Alternative	Strengths	Weaknesses	Best for
ConnectX-7 NIC (dumb)	Cheapest at line rate; same network silicon as B3220	No ARM CPU, no DOCA, no offload	Pure training fabrics where DPU mode is unused
BlueField-3 (this entry)	Mature DOCA, broad ecosystem, GPUDirect	Most expensive per-port; complex operations in DPU mode	AI clusters wanting full NIC + DPU capability
BlueField-4 DPU	800 Gb/s, ~64 ARM cores, Blackwell-era SKU	Newer, smaller install base, costlier	GB300 NVL72 era fabrics, 2026+ new builds
AMD Pensando DSC2-400	Strong P4 pipeline, deployed by HPE/AMD	Smaller DOCA-equivalent ecosystem; weaker GPU integration	AMD MI300X-based clusters; HPE-standard SKUs
Intel IPU E2000 (Mount Evans)	Tight Intel Xeon integration, P4 pipeline	Limited InfiniBand; smaller AI footprint	Hyperscale builds standardised on Intel networking
AWS Nitro / Microsoft Hololake	Proven at hyperscale	Not for sale	Internal hyperscale only

Pitfalls and operational notes

Firmware drift is the silent killer. The DPU runs three coupled firmwares (NIC, ARM bootloader, BFB image); mixing versions across a fleet causes sporadic RoCEv2 connection drops and silent GPUDirect fallback. Pin a DOCA release per pod and document the upgrade window.
NIC mode versus DPU mode is a one-way migration in practice — moving from NIC mode to DPU mode after the host is in production requires a reboot, new firmware image, and re-cabling of the OOB management network.
BAR1 sizing on the host GPU affects GPUDirect registration. If BAR1 is small (default on many BIOS), large RDMA registrations from training frameworks fail with cryptic ibv_reg_mr errors. Set BAR1 to the GPU's HBM size in BIOS.
DPU-mode containers run on ARM, not x86. Building an ARM image, hardware-attesting it, and shipping it to a fleet of DPUs is a different supply chain than the host application supply chain. Treat it as such.
OOB management network: do not let DPU management share a VLAN with host workloads. The DPU is meant to be a separate trust domain; a shared management VLAN collapses the model.
PCIe ACS (Access Control Services) enabled on intermediate root-port bridges blocks GPUDirect peer-to-peer. Disable per OEM guidance; verify with lspci -vv | grep ACSCtl.
Power: 8 × B3140 per host adds ~1.2 kW; verify chassis PSU and rack PDU sizing before scale-out.
DOCA telemetry is opt-in. Enable it before the first production run, not after the first incident.

Warning: A BlueField-3 deployed in DPU mode with default settings will silently accept any container that lands on its container runtime — including via SSH from a misconfigured operator. Treat the DPU's ARM OS as a separate hardened host: signed images only, RBAC-controlled SSH, audit logging exported to a separate collector. Yobitel NeoCloud's DPU-mode tier runs hardware-attested signed images only; replicate that discipline before going to production.

Where it fits in the Yobitel stack

References

NVIDIA BlueField-3 DPU Product Page · NVIDIA
BlueField-3 DPU Datasheet · NVIDIA
DOCA SDK Documentation · NVIDIA
NVIDIA Spectrum-X Reference Architecture · NVIDIA
DGX H100/H200 System Architecture · NVIDIA

NVIDIA BlueField-3 DPU

Overview

Specifications

Architecture: what changed in BlueField-3

Form factor, power and thermal

Interconnect: where BlueField-3 sits on PCIe and on the fabric

Software ecosystem: DOCA, drivers, deployment modes

Sizing and capacity planning

Cost and TCO

Migration and alternatives

Pitfalls and operational notes

Where it fits in the Yobitel stack

References

Browse all entries

Deploy on Yobibyte

NVIDIA BlueField-3 DPU

Overview

Specifications

Architecture: what changed in BlueField-3

Form factor, power and thermal

Interconnect: where BlueField-3 sits on PCIe and on the fabric

Software ecosystem: DOCA, drivers, deployment modes

Sizing and capacity planning

Cost and TCO

Migration and alternatives

Pitfalls and operational notes

Where it fits in the Yobitel stack

References

Browse all entries

Deploy on Yobibyte