TL;DR
- Graphcore's flagship architecture — 1,472 tiles per IPU, each with independent compute and SRAM, communicating via Bulk Synchronous Parallel.
- On-chip SRAM (900 MB per IPU on the Bow IPU) replaces HBM; capacity comes from chaining many IPUs.
- Bow IPU launched 2022 used 3D wafer-on-wafer bonding to add power-delivery silicon underneath the compute die.
- Commercial reach contracted significantly through 2023-2024; SoftBank acquired Graphcore in 2024 and the product strategy was restructured.
Overview#
The Graphcore IPU was one of the more architecturally distinctive accelerators of the 2018-2024 wave. It used a MIMD (Multiple Instruction Multiple Data) design — 1,472 independent processor tiles per chip — communicating via a Bulk Synchronous Parallel (BSP) execution model with deterministic global barriers.
The Bow IPU launched 2022 added 3D wafer-on-wafer bonding: a power-delivery die underneath the compute die improving voltage stability and clock frequency. Each Bow IPU offered 350 TFLOPS FP16 and 900 MB of on-chip SRAM.
Graphcore's commercial reach contracted notably through 2023-2024 as transformer workloads moved decisively toward CUDA. SoftBank acquired the company in 2024; the product strategy was restructured.
Specifications#
| Metric | Bow IPU |
|---|---|
| Architecture | MIMD, BSP execution |
| Tiles per chip | 1,472 |
| FP16 | 350 TFLOPS |
| FP32 | 62.5 TFLOPS |
| On-chip SRAM | ~900 MB |
| External memory | DDR4 via host |
| Process | TSMC 7 nm + power die WoW |
| System | IPU-POD (16 / 64 IPUs) |
Architecture Notes#
Each IPU tile runs six threads on a small MIMD processor, with private SRAM. Tiles communicate via on-chip interconnect during synchronisation phases; between barriers, every tile executes independently. The BSP model is conceptually clean — alternating compute and exchange phases — and the runtime is straightforward to reason about.
Programming targets Poplar — a C++ graph framework with PyTorch and TensorFlow front ends. The Poplar compiler lowers user graphs into BSP-scheduled tile programs.
Where IPUs Still Make Sense#
- Existing on-prem IPU-POD deployments running fine-tuning or research workloads.
- Workloads where BSP semantics and on-chip SRAM dataflow suit the model architecture.
- Research clusters and education environments where Poplar tooling is already in place.
- For production frontier work in 2026 — GPU or specialised inference accelerators are the safer pick.
Pitfalls#
- Software ecosystem reach is narrow and contracted further post-2024.
- External memory bandwidth via the host limits very-large model deployments.
- Roadmap clarity post-SoftBank acquisition remains in flux.
Software Notes#
Poplar SDK with PyTorch and TensorFlow front ends. Hugging Face provided Optimum-Graphcore integration through 2023. Long-term support for newer model architectures depends on Graphcore's restructured roadmap.
References
- Graphcore Bow IPU Brief · Graphcore
- Poplar SDK Documentation · Graphcore