Qdrant

TL;DR

Qdrant is an open-source vector database written in Rust, released in 2021 under Apache 2.0 by Qdrant GmbH (Berlin).
Distinguishing features: high-performance metadata filtering during ANN search, native hybrid search, scalar/product/binary quantisation, and a clean REST + gRPC API.
Single-binary deployment makes it easy to self-host; managed offering is Qdrant Cloud, available on AWS, GCP and Azure across major regions.
Often cited as the fastest pure-Rust vector engine on public benchmarks, especially for filtered queries where most other engines degrade.

Architecture#

A Qdrant cluster is a set of stateless storage nodes coordinated via a Raft-based consensus layer for metadata. Collections are sharded across nodes; each shard contains segments (immutable HNSW indices plus a mutable write-ahead buffer). Background optimisation merges segments and rebuilds graphs.

The default index is HNSW with optional scalar, product, or binary quantisation. Vectors can be configured as 'on disk' to spill from RAM, with a fast in-memory id-to-disk-offset map and lazy fetch during search; this dramatically lowers memory cost at moderate latency impact.

Filterable HNSW#

Most vector databases handle metadata filters in one of two ways. Pre-filtering scans the corpus for matches then runs nearest-neighbour over the filtered subset — exact but slow on large filtered fractions. Post-filtering runs ANN normally then drops vectors that fail the filter — fast but may under-return when the filter is selective.

Qdrant's filterable HNSW threads the filter into the graph search itself. The greedy search prunes neighbour candidates that fail the filter as it walks, so the algorithm never visits irrelevant regions and never returns short. The result is one of the few systems where heavily filtered queries (selecting 1-10% of the corpus) run almost as fast as unfiltered ones.

If your RAG workload routinely scopes searches by tenant, document type, or date range, filter performance is the single most important benchmark to run. Most engines drop sharply once filters select <10% of the corpus.

Quantisation#

Type	Size reduction	Recall impact	Notes
Scalar (int8)	4x	Negligible	Default for most deployments
Product	4-32x	Small with rescoring	IVF-style codebook learning
Binary	32x	Modest on high-dim	Requires native or rescoring pipeline

Hybrid Search#

Qdrant supports hybrid retrieval by accepting multiple named vectors per point (one dense, one sparse) and fusing the results with RRF or DBSF (Distribution-Based Score Fusion). Sparse vectors can be hand-built (BM25-style) or come from a learned sparse encoder like SPLADE — Qdrant treats them identically.

When to Pick Qdrant#

Pick Qdrant when filtered search performance matters, when you want Apache 2.0 with no enterprise-feature lock-in, or when memory efficiency at scale is critical. Pick it less aggressively if you need very deep integrations with one cloud provider's ML stack, or if you have a strong preference for the more opinionated Weaviate module model.

References

Qdrant Documentation · Qdrant
qdrant/qdrant on GitHub · GitHub
Filterable HNSW design notes · Qdrant Blog

Architecture#

Filterable HNSW#

Type

Size reduction

Recall impact

Notes

Scalar (int8)

Negligible

Default for most deployments

Product

4-32x

Small with rescoring

IVF-style codebook learning

Binary

32x

Modest on high-dim

Requires native or rescoring pipeline

Hybrid Search#

When to Pick Qdrant#

Qdrant

Architecture#

Filterable HNSW#

Quantisation#

Hybrid Search#

When to Pick Qdrant#

References

Browse all entries

Deploy on Yobitel

Qdrant

Architecture#

Filterable HNSW#

Quantisation#

Hybrid Search#

When to Pick Qdrant#

References

Browse all entries

Deploy on Yobitel