TL;DR
- Extended-performance variant of HBM3 (JEDEC JESD238B updates) with pin speeds raised from 6.4 Gb/s to up to 9.6 Gb/s, per-stack bandwidth crossing 1.2 TB/s, and 12-high stack capacities reaching 36 GB. Same 1,024-bit-per-stack interface, same TSV stacking, same 16-channel HBM3 protocol — backward-compatible at the controller level.
- Stack vendors shipping in volume from 2024: SK hynix (first to qualification, dominant supply share), Micron (volume qualified mid-2024, primary US supply), Samsung (slower qualification ramp, recovered through 2025). Process: 10 nm-class DRAM (1a/1b/1c node, vendor-specific).
- The memory generation that unlocked production frontier AI: NVIDIA H200 (141 GB at 4.8 TB/s, 6 stacks), B100/B200 (192 GB at 8 TB/s, 8 stacks per GPU die-pair), B300/GB300 Ultra (288 GB at 8 TB/s), AMD MI300X (192 GB at 5.3 TB/s, mixed HBM3/HBM3e variants), MI325X (256 GB at 6.0 TB/s, full HBM3e).
- Production capacity at SK hynix, Micron and Samsung was the primary supply constraint on Blackwell ramps through 2024-2025 — the dominant gating factor on H200/B200 availability was 12-high HBM3e yield, not GPU die yield. Allocation politics through 2025 favoured the largest hyperscaler customers.
- HBM4 lookahead: JEDEC JESD238D, 2,048-bit per stack (twice HBM3e), pin speed at 6.4-8.0 Gb/s, per-stack bandwidth ~2 TB/s, 16-high stacks at 48-64 GB. First HBM4 silicon sampling late 2025, volume production 2026-2027 on next-generation parts.
Overview#
HBM3e — written variously as HBM3E, HBM3-E, or 'HBM3 Extended' — is the extended-performance variant of HBM3 that the major DRAM vendors and JEDEC converged on through 2023 to address the memory-bandwidth crisis in AI training. The mechanical and electrical foundation is identical to HBM3: 1,024-bit interface per stack, 16 independent channels, TSV (Through-Silicon Via) 3D stacking, microbump connection to the base die, silicon interposer or CoWoS package to the host GPU. What changed is pin signalling speed (from 6.4 Gb/s to up to 9.6 Gb/s per pin), 12-high stack production (versus 8-high on launch HBM3), and DRAM process maturity that pushed per-stack capacity from 16-24 GB to 24-36 GB.
HBM3e is the memory that defined the post-2024 AI accelerator wave. Before HBM3e, the largest commercially available HBM stack was 24 GB at 819 GB/s; afterwards, 36 GB at 1.2 TB/s. That single ~50 % bandwidth uplift and ~50 % capacity uplift compounded across 5-8 stacks per GPU is what unlocked 192 GB H100-class memory budgets (H200), 192-288 GB Blackwell budgets, and AMD's 256 GB MI325X. Without HBM3e, frontier reasoning models with multi-hundred-billion-parameter active footprints and trillion-parameter MoE training would not be commercially deployable on the GPU SKUs they actually run on.
This entry covers what HBM3e is physically, how it differs electrically and mechanically from HBM3, which accelerators it ships in, the supply-chain reality of 2024-2026 production, the operational pitfalls of 12-high stacks, and the HBM4 transition that is starting to displace it at the leading edge from late 2026 onward. Every high-end GPU Yobitel NeoCloud and Yobibyte run on top of — H200, B100/B200/B300, GB200 NVL72, AMD MI325X/MI355X — is HBM3e-backed, and HBM bandwidth is one of the scheduling attributes Yobibyte uses when matching memory-pressured workloads to capacity. This entry helps you understand why HBM3e capacity matters for your training and serving choices on Yobibyte and NeoCloud, and how to read the 141 GB / 192 GB / 256 GB / 288 GB per-GPU figures in vendor datasheets when sizing your footprint.
How it works: the HBM3e stack, TSVs, and the interposer#
An HBM3e stack is a 3D-integrated DRAM module — a stack of 8 or 12 individual DRAM dies bonded together vertically using Through-Silicon Vias (TSVs), sitting on top of a base die (sometimes called the logic die or buffer die) that handles the host interface, signal redistribution, and per-channel command/address logic. The whole stack is microbumped to a silicon interposer underneath, which routes the 1,024-bit-per-stack interface to the host GPU die in a CoWoS-S or CoWoS-L (NVIDIA H200/B200) or InFO_oS (AMD MI300X) package. The interposer is the unsung hero — without 2.5D silicon interposer packaging, HBM's wide interface would be physically impossible to route at PCB scale.
The 1,024-bit-per-stack interface is divided into 16 pseudo-channels of 64 bits each. Each pseudo-channel operates independently with its own command/address bus, allowing 16-way bank-level parallelism per stack — which is what makes HBM's bandwidth per pin so much higher than DDR5 at scale (HBM trades clock speed for parallelism). HBM3e pin speeds reach up to 9.6 Gb/s (the SK hynix 9.6 Gb/s parts shipping in late 2025 are the upper bound; most volume HBM3e in 2024-2025 ran at 8.0-9.2 Gb/s), giving per-stack aggregate bandwidth of (1,024 bits / 8) x 9.6 Gb/s ≈ 1.23 TB/s for top-bin parts, and ~1.0-1.15 TB/s for the more common 8.0-9.2 Gb/s bins.
Stack heights matter as much as pin speed. The 8-high stacks (8 DRAM dies, 24 GB total at 3 GB per die) were the first HBM3e production volume in early 2024. The 12-high stacks (12 dies, 36 GB total) shipped in volume from mid-2024 and are what enables 192 GB per GPU on Blackwell (8 stacks x 24 GB on B200 — Blackwell uses 8-high 24 GB stacks because 12-high yields were not commercial-ready at Blackwell launch) and 288 GB on B300 (8 stacks x 36 GB once 12-high production matured). The 12-high stack is taller, requires more sophisticated thermal management because the heat has further to travel out, and is harder to manufacture because every additional die multiplies TSV yield risk.
Stack capacity per die also climbed. The first HBM3e dies were 3 GB (24 Gb), built on roughly 1a-node DRAM process (~12 nm class); the 1b and 1c nodes pushed to higher density, and by 2026 leading-edge HBM3e dies are 3-4 GB per layer on 1b/1c process at SK hynix and Micron, with Samsung following on its 1c-equivalent node. The combination of higher pin speed + 12-high + denser dies is what gets a single stack to 36 GB at 1.2 TB/s — the headline HBM3e number that defines its position in the memory hierarchy.
- Stack architecture: 8 or 12 DRAM dies bonded vertically via TSVs on a base die, microbumped to a silicon interposer that routes the 1,024-bit interface to the GPU.
- Interface: 1,024 bits per stack divided into 16 pseudo-channels of 64 bits each — same protocol as HBM3, backward-compatible at the controller.
- Pin signalling: up to 9.6 Gb/s per pin (HBM3 was 6.4 Gb/s); volume parts in 2024-2026 typically run at 8.0-9.2 Gb/s depending on bin and customer.
- Per-stack bandwidth: 1.0-1.23 TB/s depending on pin speed bin.
- Per-stack capacity: 24 GB (8-high x 3 GB/die) or 36 GB (12-high x 3 GB/die); some 2026 parts push to 48 GB (12-high x 4 GB/die).
- Voltage: 1.1 V (down from HBM2e's 1.2 V), reducing per-bit energy.
- Process: 10 nm-class DRAM (1a/1b/1c, vendor-specific).
- Packaging: CoWoS-S, CoWoS-L (NVIDIA) or InFO_oS (AMD) silicon interposer; not socketable, not field-replaceable.
Reference: HBM3e specifications and comparison#
HBM3e specifications compared to its neighbours in the HBM lineage. HBM3e is a JEDEC-standardised extension (covered under JESD238B amendments through 2023-2024) rather than a new specification — controllers designed for HBM3 work with HBM3e at the lower pin speeds, and full HBM3e bandwidth requires controller and interposer updates that all current AI accelerators ship by default.
| Metric | HBM2e | HBM3 | HBM3e | HBM4 (lookahead) |
|---|---|---|---|---|
| JEDEC spec | JESD235D | JESD238 | JESD238B (extensions) | JESD238D |
| First volume | 2019-2020 | Late 2022 (H100) | Q1 2024 (H200) | 2026-2027 |
| Pin speed (max) | 3.6 Gb/s | 6.4 Gb/s | Up to 9.6 Gb/s | 6.4-8.0 Gb/s |
| Interface width per stack | 1,024 bits | 1,024 bits | 1,024 bits | 2,048 bits |
| Channels per stack | 8 (16 pseudo) | 16 | 16 | 32 (likely) |
| Bandwidth per stack (max) | 460 GB/s | 819 GB/s | 1.23 TB/s | ~2.0 TB/s |
| Stack height (max in volume) | 8-high | 8-high (12-high late) | 12-high | 16-high |
| Per-die capacity (typical) | 1-2 GB | 2-3 GB | 3-4 GB | 4-6 GB |
| Per-stack capacity (max) | 16 GB | 24 GB | 36-48 GB | 48-64 GB |
| Voltage | 1.2 V | 1.1 V | 1.1 V | 1.0-1.1 V (TBD) |
| Process node (DRAM) | 1y/1z (~14-16 nm) | 1a (~12 nm) | 1b/1c (~11-12 nm) | 1c/1γ (~10-11 nm) |
| Packaging | CoWoS-S, InFO_oS | CoWoS-S, CoWoS-L | CoWoS-S, CoWoS-L, InFO_oS | CoWoS-L+, advanced interposer |
| Primary vendors | SK hynix, Samsung, Micron | SK hynix (first), Samsung | SK hynix (lead), Micron, Samsung | SK hynix, Micron, Samsung |
| Used in (example accelerators) | A100, V100S, MI250 | H100, MI300X (partial) | H200, B100/B200/B300, MI325X | Next-gen Blackwell/Rubin, MI400 |
HBM3 versus HBM3e on AMD MI300X is a special case — the original 2023 MI300X shipped with a mix of HBM3 and early HBM3e stacks depending on availability; later MI300X production and all MI325X production standardised on full HBM3e. Treat published bandwidth figures (5.3 TB/s on MI300X) as a launch-era number; refreshed MI300X parts shipped in 2024-2025 often hit closer to 5.6-6.0 TB/s.
Where HBM3e lives in 2026: the accelerator adoption map#
HBM3e is the universal high-end AI memory of 2024-2026. Every NVIDIA Hopper-refresh and Blackwell-generation part ships HBM3e; AMD's Instinct MI325X and MI355X are full HBM3e; the specialised inference accelerators that shipped through 2024-2026 (Cerebras CS-3 uses SRAM not HBM, but Groq LPU, the more recent Tenstorrent parts and various startup accelerators standardised on HBM3 or HBM3e based on availability).
- HBM3e is on every leading-edge NVIDIA part from H200 onward and on AMD MI325X/MI355X — there is no current high-end AI GPU SKU that does not use HBM3e.
- Aggregate per-GPU bandwidth is set by stack count x per-stack bandwidth. 6 stacks x 800 GB/s = 4.8 TB/s (H200); 8 stacks x 1.0 TB/s = 8 TB/s (B200); 8 stacks x 1.0 TB/s = 8 TB/s (B300, but with 12-high stacks for 288 GB capacity).
- Pod-scale HBM3e — NVL72 packs 72 x 192 GB = 13.8 TB into a single NVLink-coherent rack-scale domain.
- HBM3e is incompatible with HBM3-only controllers at full speed — controllers must be HBM3e-aware to drive the higher pin signalling.
| Accelerator | HBM3e config | Total capacity | Aggregate bandwidth | Notes |
|---|---|---|---|---|
| NVIDIA H200 SXM5 | 6 stacks x 24 GB (8-high) | 141 GB | 4.8 TB/s | Same GH100 silicon as H100, HBM3e instead of HBM3 |
| NVIDIA H200 NVL | 6 stacks x 24 GB (8-high) | 141 GB | 4.8 TB/s | PCIe variant |
| NVIDIA B100 | 8 stacks x 24 GB (8-high) | 192 GB | 8 TB/s | Air-cooled Blackwell variant |
| NVIDIA B200 | 8 stacks x 24 GB (8-high) | 192 GB | 8 TB/s | SXM6 / dual-die Blackwell |
| NVIDIA B300 / GB300 Ultra | 8 stacks x 36 GB (12-high) | 288 GB | 8 TB/s | 12-high stacks, mid-cycle refresh |
| NVIDIA GB200 NVL72 (per GPU) | 8 stacks x 24 GB (8-high) | 192 GB/GPU = 13.8 TB/rack | 8 TB/s/GPU | Rack-scale 72-GPU coherent pool |
| AMD MI300X | 8 stacks (mix HBM3/HBM3e) | 192 GB | 5.3 TB/s launch / 5.6-6.0 TB/s late | Original 2023 SKU had mixed stack supply |
| AMD MI325X | 8 stacks x 32 GB (12-high) | 256 GB | 6.0 TB/s | Full HBM3e, late-2024 launch |
| AMD MI355X | 8 stacks x 36 GB (12-high) | 288 GB | 8.0 TB/s | 2025 refresh, parity with B300 |
| Intel Gaudi 3 | 8 stacks (HBM2e mostly, some HBM3) | 128 GB | 3.7 TB/s | Predates volume HBM3e |
| Groq LPU (gen 2) | n/a (SRAM-based) | n/a | n/a | Listed for contrast — Groq does not use HBM |
| Cerebras WSE-3 | n/a (on-die SRAM) | 44 GB on-die | 21 PB/s on-die | Listed for contrast |
Supply chain: SK hynix, Micron, Samsung, and the 2024-2025 allocation crisis#
HBM3e supply through 2024-2025 was the single largest gating factor on AI infrastructure deployment globally. SK hynix qualified 8-high HBM3e first (Q4 2023, shipping to NVIDIA from Q1 2024 for H200 launch volume), Micron qualified second (Q2 2024, primary US-based supply, shipping to NVIDIA from mid-2024), and Samsung qualified later in 2024 after well-documented yield difficulties on its initial HBM3e bin. By late 2025, all three vendors were shipping 12-high HBM3e in volume, but the supply chain remained tight through 2026 because of allocation politics — the largest hyperscaler customers and NVIDIA itself absorb the majority of incremental capacity.
Pricing per HBM3e stack landed in the $200-$400 range in volume through 2024-2025 (8-high 24 GB stacks), with 12-high 36 GB stacks pricing at $350-$700 depending on bin and customer. A B200 GPU at 8 stacks is therefore $1,600-$3,200 in HBM cost alone before the GPU die, the interposer, the packaging, the cooling and the carrier — HBM commonly accounts for 35-50 % of a high-end AI GPU's bill of materials. This is why HBM capacity, not GPU silicon, has been the actual production constraint on the AI build-out.
Geographic concentration is the other supply story. SK hynix and Samsung both manufacture HBM3e in South Korea; Micron manufactures in Taiwan (Taichung) and Hiroshima, Japan, with US Idaho expansion ramping. JEDEC standardisation means parts are second-sourceable in principle, but qualification cycles at NVIDIA and AMD typically take 6-9 months per vendor-bin combination, so single-source supply at launch is the norm.
- Vendor share (rough 2025 volume): SK hynix ~50-55 %, Micron ~25-30 %, Samsung ~20-25 % — shifting toward more balanced 2026 as Samsung yields improve.
- Per-stack pricing (2024-2025): $200-$400 for 8-high 24 GB, $350-$700 for 12-high 36 GB stacks in volume.
- HBM as share of accelerator BoM: 35-50 % typical for high-end AI GPUs in 2024-2026.
- Qualification cycle: 6-9 months per vendor-bin combination at NVIDIA/AMD.
- Geographic concentration: predominantly South Korea (SK hynix, Samsung) + Taiwan/Japan (Micron); US Idaho ramp from late 2025.
- Allocation politics: the largest customers (hyperscalers, NVIDIA itself for HGX modules) consume the majority of incremental HBM3e capacity; long-tail neoclouds and smaller OEMs frequently constrained.
HBM4 transition and lookahead#
HBM4 (JEDEC JESD238D, finalised in 2024) is the next generation displacing HBM3e at the leading edge from late 2026 onward. The headline change is interface width: HBM4 doubles the per-stack interface from 1,024 bits to 2,048 bits, which roughly doubles per-stack bandwidth at modest pin-speed increases (HBM4 pin speed targets 6.4-8.0 Gb/s, which is similar to HBM3 — the bandwidth uplift comes from the wider bus, not faster pins). Per-stack bandwidth on HBM4 targets ~2.0 TB/s, with 16-high stacks at 48-64 GB capacity. First HBM4 silicon sampled in late 2025 (SK hynix engineering samples to NVIDIA and AMD), with volume production targeting 2026-2027 on next-generation parts.
The wider interface has packaging consequences. The 2,048-bit-per-stack bus pushes the interposer routing density beyond what conventional CoWoS-S can support; next-generation packaging (CoWoS-L with longer interposers, hybrid bonding rather than microbumps, and chiplet-style HBM-on-base-die integration) is part of the HBM4 transition. The base die also becomes more capable — vendors are positioning HBM4 base dies as potential locations for compute logic (PIM, processing-in-memory) and for memory controller offload, which would change the HBM-to-GPU relationship from 'dumb pool' to 'active near-data tier'.
Operational consequence for buyers: HBM3e remains the volume part through 2026 on B100/B200/B300/GB200 NVL72 and on AMD MI325X/MI355X. HBM4 first appears on next-generation parts (NVIDIA Rubin-class, AMD MI400-class) launching late 2026 to 2027. Treat HBM3e as the stable production memory for the current AI buildout and HBM4 as the lookahead generation for capacity-planning conversations starting 2027.
Pitfalls and operational notes#
- Thermal: 12-high HBM3e stacks dissipate more power than 8-high (~12-18 W per stack at full bandwidth, versus ~8-12 W for 8-high). Total HBM thermal load on a B300 (8x 12-high) approaches 130-150 W just for memory — cooling design must account for stack-level thermals, not just die thermals.
- Pin-speed variation by vendor and bin — different vendors hit different sustained speeds, and per-batch validation is normal at the accelerator OEM level. Published bandwidth figures (4.8 TB/s on H200, 8 TB/s on B200) are typical-case; some accelerator builds run at slightly lower aggregate bandwidth depending on the HBM3e bin they were assembled with.
- Not field-replaceable — HBM3e stacks are bonded to the silicon interposer in CoWoS-S/L packaging. A failed stack means the GPU is RMA'd, not repaired. Stack-level ECC and HBM repair (via spare TSVs and redundant rows/columns) extend useful life, but ultimate failure mode is full-GPU replacement.
- Supply constraints — through 2024-2025, HBM3e capacity was tighter than GPU die capacity at NVIDIA/AMD. Allocation politics still affect availability for non-top-tier customers in 2026; lead times for HBM3e-bearing GPUs frequently exceed lead times for the GPU silicon itself.
- Pricing volatility — HBM3e per-stack pricing moved by ~30 % between Q1 2024 and Q4 2025 as supply caught up. Capex models that lock in HBM cost more than 12 months out are speculative.
- Compatibility with HBM3 controllers — HBM3e is backward-compatible electrically with HBM3 controllers at HBM3 speeds (6.4 Gb/s), but full HBM3e bandwidth requires controller and interposer updates. There is no scenario where you 'upgrade' an H100 to HBM3e by swapping stacks — the controller, base die and interposer would all need to change. H200 exists as a separate SKU for this reason.
- ECC and remapped-row counters — HBM3e supports stack-level ECC with single-bit correction and double-bit detection per channel. Operational monitoring should track `DCGM_FI_DEV_RETIRED_DBE` / `RETIRED_SBE` (NVIDIA) or ROCm equivalents and treat steady climbs as predictive of stack failure within weeks.
- Packaging supply — CoWoS-S/L capacity at TSMC is a parallel constraint to HBM die capacity. The full GPU + HBM + interposer + packaging supply chain has multiple bottlenecks that must all clear together for volume to ship.
Where this fits in the Yobitel stack#
HBM3e is the memory generation under every high-end accelerator in the Yobitel stack in 2026. Yobibyte schedules workloads across H200, B100, B200, B300, GB200 NVL72 and AMD MI325X/MI355X pools — every one of those SKUs is HBM3e-backed, and the platform's placement layer is aware of the per-GPU HBM capacity (141 GB on H200, 192 GB on B100/B200/GB200, 256 GB on MI325X, 288 GB on B300/MI355X) and bandwidth (4.8 TB/s on H200, 6.0 TB/s on MI325X, 8 TB/s on B200/B300) when matching workloads to capacity.
Omniscient Compute — our cross-cloud capacity broker — surfaces HBM capacity per GPU as a first-class scheduling attribute. Workloads with KV-cache-heavy decode patterns or memory-pressured weight footprints are matched to HBM3e tiers with sufficient bandwidth headroom, not just sufficient capacity; the broker treats H200's 4.8 TB/s and B200's 8 TB/s as fundamentally different SKUs even though both ship with HBM3e.
InferenceBench publishes throughput and latency numbers by HBM tier — the same model run on H100 HBM3 versus H200 HBM3e versus B200 HBM3e gives three measurably different cost-per-token curves, and the gap widens with context length. The HBM3e adoption tables in this entry are anchored on InferenceBench's per-SKU coverage; if you are evaluating whether a workload benefits from moving to HBM3e-class memory, the benchmark data will tell you whether the bandwidth uplift translates to throughput uplift on your specific traffic shape.