TL;DR
- Cut-down GA100 with 24 GB HBM2 at 933 GB/s — A100's memory class at a fraction of the cost.
- 165 W PCIe card supporting MIG (up to 4 slices), positioned between A10 and A100.
- Strong for inference of 13B-class models where HBM bandwidth matters but A100 is over-spec.
- Modest deployment relative to A10/A100; effectively niche by 2026.
Overview#
The A30 occupies an unusual slot in the Ampere lineup. Unlike A10 (GA102 with GDDR6), A30 is built on the same GA100 die as A100, but with fewer SMs enabled and only 24 GB of HBM2. The result is a card with HBM-class bandwidth at a price point well below A100.
By 2026 the A30 is mostly seen in pre-existing enterprise inference fleets. New deployments tend to bypass it for L40S (better single-card throughput) or A100/H100 (better $/training-token).
Specifications#
| Metric | A30 |
|---|---|
| Architecture | Ampere (GA100) |
| FP64 (Tensor) | 10.3 TFLOPS |
| FP32 | 10.3 TFLOPS |
| TF32 (Tensor, sparse) | 165 TFLOPS |
| BF16 / FP16 (Tensor, sparse) | 330 TFLOPS |
| INT8 (Tensor, sparse) | 661 TOPS |
| Memory | 24 GB HBM2 |
| Memory bandwidth | 933 GB/s |
| TDP | 165 W |
| Form factor | PCIe Gen4 x16, dual-slot |
| NVLink | 200 GB/s (bridge, optional) |
| MIG instances | Up to 4 |
Why HBM at This Tier#
A30 exists because some inference workloads — long-sequence transformer decode in particular — are bandwidth-bound, not FLOPS-bound. A10's 600 GB/s of GDDR6 limits these workloads; A30's 933 GB/s of HBM2 substantially closes the gap to A100 at a lower price.
Pairing HBM2 with FP32-style compute (10.3 TFLOPS) gives the card an unusually high bandwidth-to-FLOPS ratio. For memory-bound inference shapes this can produce per-watt throughput close to A100 80 GB.
When to Pick A30#
- Bandwidth-bound LLM inference where GDDR6 is too slow but A100 is too expensive.
- Multi-tenant inference using MIG to host four hardware-isolated slices.
- Pre-existing fleets where total cost of ownership has been amortised.
- Pick L40S for raw inference throughput on dense models.
- Pick A100 / H100 if training is in scope or 24 GB is insufficient.
Pitfalls#
- HBM2 (not HBM2e or HBM3) caps bandwidth well below modern parts.
- Limited availability — A30 was less popular than A10 / A100 and is harder to procure in 2026.
- MIG slices on A30 are smaller (memory-wise) than A100 slices — sizing must be re-validated.
- No FP8 support; modern quantised inference paths bypass A30 entirely.
Software Notes#
Standard CUDA 11.x / 12.x / 13 support, full vLLM and TensorRT-LLM compatibility. MIG configuration uses the same nvidia-smi mig commands as A100 with different slice profiles.
References
- NVIDIA A30 Datasheet · NVIDIA