TL;DR
- GGML is both a C tensor library (still active) and the name of an early single-file model format (now legacy).
- Tensor library underpins llama.cpp, whisper.cpp and other ggerganov projects; provides quantised dtypes and CPU-first execution.
- The model-format usage was superseded by GGJT and then GGUF in August 2023.
- Old `.ggml` and `.bin` files are still found in the wild but most tools have dropped support; convert to GGUF for modern workflows.
Two Things Called GGML#
GGML refers to two related-but-distinct things. The first is the C tensor library written by Georgi Gerganov — a small, dependency-free implementation of tensor operations with quantised dtypes, designed for CPU and accelerator backends. The library remains very much alive and powers llama.cpp and whisper.cpp.
The second is a single-file model format that llama.cpp used in its early days. That format went through several iterations — GGML, GGJF, GGJT — before being replaced by GGUF in August 2023 because the older formats lacked the metadata flexibility needed for new model architectures.
Tensor Library#
- C99, no external dependencies.
- Quantised dtypes: Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, Q8_1 and the K-quants (Q2_K through Q6_K).
- Operators include matrix multiplication, attention, RoPE, normalisation, activations.
- Backends: CPU (with AVX, AVX2, AVX-512, NEON, AMX paths), CUDA, Metal, Vulkan, SYCL, ROCm.
- Used by llama.cpp, whisper.cpp, stable-diffusion.cpp, bark.cpp.
Legacy Format#
Old GGML model files used a fixed-layout binary structure with limited metadata. As model architectures diversified — RoPE variants, GQA, sliding windows, MoE — adding new fields required breaking the format. GGUF replaced it with a key-value metadata section that lets new architectures land without breaking existing readers.
Practically, any model file with a `.ggml` extension or marked as 'GGJT' should be considered legacy. Tools dropped support over 2023-2024; convert to GGUF (or download a re-published GGUF version) for use with current llama.cpp builds.
Pre-GGUF model files are not forward compatible. If you find a `.ggml` or `.bin` model that no longer loads, the only fix is to convert from the original safetensors checkpoint to GGUF.
When to Care#
Engineers writing custom inference for non-LLM models (audio, image, scientific) sometimes prefer GGML as a tensor library because of its simplicity and broad hardware reach. For LLM users the library is an implementation detail; the user-visible format is GGUF.
References
- ggml on GitHub · GitHub
- GGUF Format Specification (successor) · GitHub
- llama.cpp on GitHub · GitHub