TL;DR
- Qdrant is an open-source vector database written in Rust, released in 2021 under Apache 2.0 by Qdrant GmbH (Berlin).
- Distinguishing features: high-performance metadata filtering during ANN search, native hybrid search, scalar/product/binary quantisation, and a clean REST + gRPC API.
- Single-binary deployment makes it easy to self-host; managed offering is Qdrant Cloud, available on AWS, GCP and Azure across major regions.
- Often cited as the fastest pure-Rust vector engine on public benchmarks, especially for filtered queries where most other engines degrade.
Architecture#
A Qdrant cluster is a set of stateless storage nodes coordinated via a Raft-based consensus layer for metadata. Collections are sharded across nodes; each shard contains segments (immutable HNSW indices plus a mutable write-ahead buffer). Background optimisation merges segments and rebuilds graphs.
The default index is HNSW with optional scalar, product, or binary quantisation. Vectors can be configured as 'on disk' to spill from RAM, with a fast in-memory id-to-disk-offset map and lazy fetch during search; this dramatically lowers memory cost at moderate latency impact.
Filterable HNSW#
Most vector databases handle metadata filters in one of two ways. Pre-filtering scans the corpus for matches then runs nearest-neighbour over the filtered subset — exact but slow on large filtered fractions. Post-filtering runs ANN normally then drops vectors that fail the filter — fast but may under-return when the filter is selective.
Qdrant's filterable HNSW threads the filter into the graph search itself. The greedy search prunes neighbour candidates that fail the filter as it walks, so the algorithm never visits irrelevant regions and never returns short. The result is one of the few systems where heavily filtered queries (selecting 1-10% of the corpus) run almost as fast as unfiltered ones.
If your RAG workload routinely scopes searches by tenant, document type, or date range, filter performance is the single most important benchmark to run. Most engines drop sharply once filters select <10% of the corpus.
Quantisation#
| Type | Size reduction | Recall impact | Notes |
|---|---|---|---|
| Scalar (int8) | 4x | Negligible | Default for most deployments |
| Product | 4-32x | Small with rescoring | IVF-style codebook learning |
| Binary | 32x | Modest on high-dim | Requires native or rescoring pipeline |
Hybrid Search#
Qdrant supports hybrid retrieval by accepting multiple named vectors per point (one dense, one sparse) and fusing the results with RRF or DBSF (Distribution-Based Score Fusion). Sparse vectors can be hand-built (BM25-style) or come from a learned sparse encoder like SPLADE — Qdrant treats them identically.
When to Pick Qdrant#
Pick Qdrant when filtered search performance matters, when you want Apache 2.0 with no enterprise-feature lock-in, or when memory efficiency at scale is critical. Pick it less aggressively if you need very deep integrations with one cloud provider's ML stack, or if you have a strong preference for the more opinionated Weaviate module model.
References
- Qdrant Documentation · Qdrant
- qdrant/qdrant on GitHub · GitHub
- Filterable HNSW design notes · Qdrant Blog