TL;DR
- pgvector is an open-source PostgreSQL extension by Andrew Kane (`ankane`), first released April 2021 under the PostgreSQL License (a permissive BSD-style licence). The 0.8 series (2026) ships HNSW and IVFFlat indices, six distance operators, and four vector types (vector, halfvec, bit, sparsevec).
- Adds a `vector(N)` column type plus the distance operators `<->` (L2), `<#>` (negative inner product), `<=>` (cosine), `<+>` (L1), and Hamming / Jaccard for bit vectors. ORDER BY any operator triggers an index scan; without the operator the planner falls back to sequential.
- Available out of the box on every major managed Postgres — Amazon RDS, Aurora, Cloud SQL, Azure Database for PostgreSQL, Supabase, Neon, Crunchy Data, Tembo, Timescale. Self-hosted install is `CREATE EXTENSION vector;` against any Postgres 13+.
- The pitch is operational: if you already run Postgres, you do not need to add a second database for vectors. Embeddings, metadata, ACLs and application data live in the same tables with transactional consistency, the same backup and replication story, the same monitoring, the same access control model.
- Trade-off versus dedicated vector databases — slower to build large indices (parallel build helps since 0.6 but still trails Qdrant or Milvus by 2-5x), less specialised filter optimisation at extreme scale (above ~500M vectors per node), no built-in vector-aware sharding. Most teams under hundreds of millions of vectors never hit those limits.
Overview#
pgvector is an open-source PostgreSQL extension that adds vector data types, distance operators and approximate-nearest-neighbour (ANN) indices to Postgres. It was created by Andrew Kane (the author of much-loved Postgres extensions like pghero and pgsync) and first released in April 2021. Five years later it is the most-installed vector store in production by a wide margin, almost entirely because it removes the operational question 'should we add a vector database?' from the system-design conversation. The answer for teams already on Postgres is now usually 'no — use the database you already have'.
The 2023 release of HNSW indexing in pgvector 0.5 was the inflection point. Until then, IVFFlat was the only available index and performance lagged dedicated vector engines noticeably. HNSW closed most of that gap, and the 0.6 (parallel index build), 0.7 (halfvec, bit, sparsevec types) and 0.8 (iterative scan with metadata filters, faster bit-vector ops) releases since have made pgvector competitive on every operational dimension that matters for under-500M-vector workloads. Above that scale, dedicated engines (Qdrant, Milvus, Vespa) still win on raw throughput and specialised filter performance.
pgvector's reach is enormous. Every major managed Postgres ships it: Amazon RDS and Aurora (since May 2023), Google Cloud SQL and AlloyDB, Azure Database for PostgreSQL, Supabase (where it is the default vector store and the foundation of Supabase Vector), Neon, Crunchy Data, Tembo, Timescale, DigitalOcean Managed Databases. Self-hosted installs run on any Postgres 13+ with the development headers; the source builds in seconds. The footprint is so broad that almost every Postgres user has it available without adding a single dependency.
This entry is the operational reference for engineers building production systems on pgvector — every SQL clause, every index option, every distance operator, sizing arithmetic at 1M / 10M / 100M / 1B rows, workload patterns for RAG, recommendation and image search, observability hooks, security posture, and migration trade-offs versus FAISS, Pinecone and the open-source dedicated vector engines. Customers running pgvector on a Yobitel NeoCloud-hosted Postgres get the operational benefit of in-database vector search without standing up a separate vector store, and inherit NeoCloud's UK / EU regional posture (NCSC OFFICIAL-aligned) for the same data. This entry helps you decide if pgvector fits your similarity-search needs versus a dedicated vector DB, and if it does, how to size and operate it.
Quick start#
The session below installs pgvector on a Postgres 16 instance, creates a documents table with a 1024-dimensional embedding column, builds an HNSW index for cosine distance, inserts a few rows of embeddings, and runs a similarity search filtered by tenant. It runs today against any Postgres that has the extension available — locally with `docker run pgvector/pgvector:pg16`, or on any managed offering.
-- 1. Enable the extension (one-time, requires the postgres role).
CREATE EXTENSION IF NOT EXISTS vector;
-- 2. Schema for a RAG-style corpus.
CREATE TABLE documents (
id bigserial PRIMARY KEY,
tenant_id uuid NOT NULL,
source_uri text NOT NULL,
chunk_offset int NOT NULL,
chunk_text text NOT NULL,
embedding vector(1024) NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX documents_tenant_idx ON documents (tenant_id);
-- 3. Build an HNSW index for cosine distance.
-- m and ef_construction trade index build cost for query recall.
CREATE INDEX documents_embedding_hnsw_idx
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 200);
-- 4. Insert a few rows (in practice, batched COPY from your ingest pipeline).
INSERT INTO documents (tenant_id, source_uri, chunk_offset, chunk_text, embedding) VALUES
('00000000-0000-0000-0000-000000000001', 's3://docs/policy.md', 0,
'Employees are entitled to 25 days of annual leave plus statutory bank holidays.',
'[0.012, -0.034, ...]'::vector),
('00000000-0000-0000-0000-000000000001', 's3://docs/policy.md', 1,
'Overtime above 40 hours per week is paid at 1.5x the base hourly rate.',
'[0.057, 0.011, ...]'::vector);
-- 5. Query: top-5 chunks by cosine distance, scoped to a tenant.
SET hnsw.ef_search = 100; -- session-scoped, trades latency for recall
SELECT id, source_uri, chunk_text, embedding <=> $1 AS distance
FROM documents
WHERE tenant_id = $2
ORDER BY embedding <=> $1
LIMIT 5;The placeholder $1 in the ORDER BY is the query embedding (a vector(1024) literal or bind parameter). Always parameterise — concatenating embedding literals into SQL blows out the parser and leaks vectors to query logs.
How it works#
pgvector is a standard Postgres extension installed via `CREATE EXTENSION vector;`. It adds new types, operators, operator classes, GUC variables and access methods to the database; everything else (transactions, replication, backups, ACLs, monitoring, foreign data wrappers) is inherited from Postgres itself. Vectors are stored inline in the row tuple like any other Postgres value, with TOAST handling for rows that exceed the page size — so a 1536-dimensional float32 vector (6,144 bytes) lives compressed and out-of-line in the TOAST table.
Distance operators are infix functions that the planner recognises as eligible for index scans. `<->` returns L2 distance, `<#>` returns negative inner product (negated so smaller is closer, matching the L2 convention), `<=>` returns cosine distance (1 minus cosine similarity), `<+>` returns L1 (Manhattan) distance. For bit and sparsevec types, `<~>` is Hamming and `<%>` is Jaccard. The planner uses the index only when the ORDER BY exactly matches the index's operator class — `ORDER BY embedding <=> $1` uses a cosine index, but `ORDER BY embedding <-> $1` does not, even on the same column. Each index is bound to one distance metric.
Two index types ship in 0.8. HNSW (since 0.5, July 2023) is the recommended default. It builds the same Hierarchical Navigable Small World graph as the algorithm's reference implementation (Malkov and Yashunin, 2016) — see hnsw-algorithm. Build parameters `m` (max neighbours per node, default 16, typical production value 24-48) and `ef_construction` (build candidate pool, default 64, typical 100-400) determine index quality and build cost. Query-time parameter `hnsw.ef_search` (default 40) controls the candidate pool size during query and is set as a session GUC; raising it improves recall at linearly higher latency.
IVFFlat (since 0.1) is the older option, still useful for write-heavy workloads where index build cost dominates. It clusters vectors into `lists` Voronoi cells with k-means at build time, and the query-time parameter `ivfflat.probes` (default 1, typical 10-50) controls how many cells are searched. Recall and latency both scale roughly with `probes / lists`. IVFFlat builds faster than HNSW and supports streaming updates without the graph-update overhead, but its recall ceiling is lower at any given latency. The pgvector documentation now recommends HNSW for most workloads and IVFFlat only when build cost is the binding constraint.
From 0.7 onward, pgvector supports three additional vector types beyond the original `vector` (float32). `halfvec(N)` stores float16, halving memory at modest recall cost. `bit(N)` stores binary vectors for Hamming distance — useful with binary-quantised embeddings (Cohere's binary embeddings, JinaAI binary). `sparsevec(N)` stores sparse vectors with explicit (index, value) tuples for terms — the storage backing SPLADE-style learned sparse retrievers inside Postgres. The 0.8 release added iterative scan, which lets the planner work through the HNSW graph in batches when a WHERE clause filters out many candidates, dramatically improving performance on selective-filter queries.
- Vector storage: inline in row, TOAST for large vectors. No separate vector store — vectors live in the same heap as the rest of the row's columns.
- Distance operators: <-> L2, <#> negative inner product, <=> cosine, <+> L1 for float/half vectors; <~> Hamming, <%> Jaccard for bit/sparse.
- Index types: HNSW (recommended) and IVFFlat. Each index binds to one distance metric via its operator class.
- GUC variables: `hnsw.ef_search`, `hnsw.iterative_scan`, `ivfflat.probes`, `ivfflat.iterative_scan` — session- or query-scoped, no restart needed.
- Parallel build: HNSW supports parallel workers via `max_parallel_maintenance_workers` since 0.6; typical 2-4x speedup on multi-core machines.
Reference#
Canonical reference for every SQL clause and operator pgvector adds to Postgres, accurate to the 0.8 series (June 2026). Anything not listed here is documented in the upstream README; everything listed here is operationally relevant in production deployments.
| Construct | Form | Purpose | Notes |
|---|---|---|---|
| Extension install | CREATE EXTENSION vector; | Activates the extension in the current database. | One-time. Requires CREATEROLE or superuser unless preinstalled. |
| vector type | vector(N) | Fixed-dim float32 vector column. | Max 16,000 dim stored; max 2,000 dim for HNSW/IVFFlat indices. |
| halfvec type | halfvec(N) | Float16 vector — half the storage of vector. | 0.7+. Up to 4,000 dim indexed. |
| bit type | bit(N) | Binary vector for Hamming distance. | Postgres built-in type, indexed by pgvector ops. |
| sparsevec type | sparsevec(N) | Sparse vector with (index,value) tuples. | 0.7+. Up to 1,000 non-zero indexed. |
| L2 operator | <-> | Euclidean (L2) distance. | Smaller is closer. |
| Inner product operator | <#> | Negative inner product. | Negated so smaller is closer — matches ORDER BY ASC convention. |
| Cosine operator | <=> | Cosine distance (1 - cosine similarity). | Most common for normalised embeddings. |
| L1 operator | <+> | Manhattan (L1) distance. | 0.7+. |
| Hamming operator | <~> | Hamming distance for bit vectors. | 0.7+. |
| Jaccard operator | <%> | Jaccard distance for bit vectors. | 0.7+. |
| HNSW index | CREATE INDEX ON t USING hnsw (col vector_cosine_ops) WITH (m=24, ef_construction=200); | Build HNSW graph for cosine distance. | Operator classes: vector_l2_ops, vector_ip_ops, vector_cosine_ops, vector_l1_ops, halfvec_*_ops, bit_hamming_ops, etc. |
| IVFFlat index | CREATE INDEX ON t USING ivfflat (col vector_l2_ops) WITH (lists=1000); | Build IVF index with k-means clusters. | Set lists to rows/1000 (≤1M) or sqrt(rows) (>1M). |
| HNSW build options | m (default 16), ef_construction (64) | Graph connectivity and build candidate pool. | Production: m=24-48, ef_construction=100-400. |
| IVFFlat build options | lists (default 100) | Number of Voronoi cells. | Train on a sample if dataset is huge. |
| HNSW query GUC | SET hnsw.ef_search = 100; | Candidate pool size at query time. | Raise for recall, lower for latency. Default 40. |
| IVFFlat query GUC | SET ivfflat.probes = 10; | Number of cells searched per query. | Raise for recall, lower for latency. Default 1. |
| Iterative scan | SET hnsw.iterative_scan = strict_order; | Continue searching past LIMIT under filter pressure. | 0.8+. Use with selective WHERE clauses. |
| Distance functions | l2_distance(a,b), inner_product(a,b), cosine_distance(a,b) | Function form of the operators. | Useful inside aggregates, CTEs. |
| Vector aggregates | AVG(embedding), SUM(embedding) | Centroid / sum across rows. | 0.5+. Used in clustering pipelines. |
| Vector arithmetic | embedding + other_embedding, embedding * 2.0 | Element-wise add, subtract, scalar multiply. | Useful for query rewriting (a + b - c style). |
| Vector helpers | vector_dims(v), vector_norm(v) | Dimension count, L2 norm. | Useful for sanity checks and normalisation. |
| Subvector | subvector(v, start, count) | Slice a vector — supports Matryoshka truncation. | 0.7+. |
| Binary quantise | binary_quantize(v) | Convert float vector to bit vector by sign. | 0.7+. For binary-vector hybrid pipelines. |
Operator class mismatch is the most common pgvector pitfall. A HNSW index built with `vector_cosine_ops` accelerates ORDER BY `<=>` but not `<->`. The planner silently falls back to sequential scan; the query just gets slow. Always EXPLAIN ANALYZE the query and confirm the index appears in the plan.
Workload patterns#
Three workload shapes account for the bulk of pgvector deployments: RAG inside an existing Postgres-backed application; hybrid recommendation systems where vector similarity is one signal among many in a SQL query; and image / multimodal similarity search backing product discovery or content moderation. Each pattern uses pgvector differently.
Pattern A — RAG inside an existing Postgres app. The corpus chunks, embeddings and per-tenant ACLs all live in Postgres. Hybrid retrieval combines a pgvector cosine search with Postgres full-text search (`tsvector` / `tsquery`) or ParadeDB's Lucene-based BM25, fused with Reciprocal Rank Fusion in a CTE. Critical operational decision: enforce tenant scoping at the SQL layer (`WHERE tenant_id = $1`), not at the application layer.
Pattern B — Recommendation with hybrid SQL filter plus vector similarity. The classic query is 'find products similar to this one, in stock, in the user's region, under $100, excluding what they've already bought'. The non-vector predicates are highly selective; the vector ranking is the tiebreaker on the surviving rows. pgvector 0.8's iterative scan dramatically improves this pattern by continuing past LIMIT until the WHERE clause is satisfied.
Pattern C — Image and multimodal similarity. Vectors come from CLIP, SigLIP or a domain-specific encoder; embeddings are 512-1024 dim float32. The dominant queries are reverse image search ('find products that look like this uploaded photo') and content moderation ('flag uploads similar to known prohibited content'). Indexing pattern is the same as RAG; the differences are at the application layer (image-encoder service, deduplication on Hamming distance with binary quantisation, large blob storage for the originals).
-- Pattern A: RAG hybrid retrieval (BM25 via ParadeDB + cosine via pgvector + RRF)
-- Requires the pg_search extension from ParadeDB for BM25.
WITH dense AS (
SELECT id, chunk_text,
ROW_NUMBER() OVER (ORDER BY embedding <=> $1) AS rank
FROM documents
WHERE tenant_id = $2
ORDER BY embedding <=> $1
LIMIT 50
),
sparse AS (
SELECT id, chunk_text,
ROW_NUMBER() OVER (ORDER BY paradedb.score(id) DESC) AS rank
FROM documents
WHERE tenant_id = $2
AND chunk_text @@@ $3 -- ParadeDB BM25 match operator
LIMIT 50
)
SELECT id, chunk_text, SUM(rrf) AS score
FROM (
SELECT id, chunk_text, 1.0 / (60 + rank) AS rrf FROM dense
UNION ALL
SELECT id, chunk_text, 1.0 / (60 + rank) AS rrf FROM sparse
) fused
GROUP BY id, chunk_text
ORDER BY score DESC
LIMIT 10;
-- Pattern B: hybrid recommendation with selective filters
SET hnsw.iterative_scan = strict_order;
SELECT p.id, p.title, p.price,
p.embedding <=> $1 AS distance
FROM products p
WHERE p.region_id = $2
AND p.in_stock = true
AND p.price < 100
AND p.id NOT IN (SELECT product_id FROM purchases WHERE user_id = $3)
ORDER BY p.embedding <=> $1
LIMIT 20;
-- Pattern C: image dedup with binary quantisation for fast Hamming
SELECT id, image_uri,
binary_quantize(embedding) <~> binary_quantize($1) AS hamming
FROM images
WHERE binary_quantize(embedding) <~> binary_quantize($1) < 32
ORDER BY hamming
LIMIT 50;Pattern B's selective filters are where pre-0.8 pgvector struggled — the HNSW search would return LIMIT rows before the filter could be applied, producing too-few results. Iterative scan (`hnsw.iterative_scan`) fixes this; on 0.7 and earlier, the workaround was raising `hnsw.ef_search` substantially or using IVFFlat with high `probes`.
Sizing and capacity planning#
pgvector's storage and memory footprint are predictable. Per-row storage for a `vector(N)` column is approximately `4*N + 8` bytes (4 bytes per float32 plus a small header) plus the heap-row overhead Postgres adds for every tuple. HNSW index overhead is roughly `M * 8` bytes of pointer storage per vector per layer the vector appears in, plus the vector itself — practically, a 1024-dimensional float32 corpus with HNSW (m=32) takes about 1.5x the raw vector storage for the index.
Build time and memory at scale follow the sizing table below. Anchors are mid-range observations from `maintenance_work_mem = 4GB`, four parallel workers, 1024-dimensional vectors, modern x86 (Intel Sapphire Rapids or AMD Genoa) Postgres on local NVMe. Treat as planning guides; benchmark on your actual hardware before committing.
Query latency is dominated by graph traversal cost. On HNSW with `m=32` and `ef_search=100`, a single query against 10 M vectors typically completes in 2-8 ms cold, 0.5-2 ms warm. At 100 M vectors, expect 5-20 ms warm. Filter-heavy queries scale worse — selective WHERE clauses on metadata columns can hurt performance on the older sequential-filter path; iterative scan (0.8+) brings the worst case under control. Concurrency scales with Postgres's connection model — use PgBouncer in transaction-pooling mode for high-QPS workloads, since each Postgres backend is a process and HNSW search holds CPU.
Memory-wise, pgvector benefits dramatically from having both the heap (vectors) and the index pages cached. Aim for `shared_buffers` of 25-40% of instance RAM and rely on OS page cache for the rest. The HNSW graph is not memory-mappable from cold storage without huge latency penalties (page faults dominate); size RAM to comfortably hold the index for hot workloads. Quantisation helps: `halfvec` halves footprint at typically <1% recall loss; binary quantisation cuts footprint 32x but is appropriate only for first-stage filtering where a re-rank pass restores precision.
| Corpus | Raw vector size | HNSW index size | HNSW build time | RAM during build | Practical posture |
|---|---|---|---|---|---|
| 1 M vectors | ~4 GB | ~6 GB | ~5-15 minutes | 8-16 GB | Fits comfortably on a single small Postgres instance. |
| 10 M vectors | ~40 GB | ~60 GB | ~1-3 hours | 32-64 GB | Single instance; raise `maintenance_work_mem` to 16-32 GB. |
| 100 M vectors | ~400 GB | ~600 GB | ~12-30 hours | 128-256 GB | Single large instance or partitioned by tenant; parallel build essential. |
| 1 B vectors | ~4 TB | ~6 TB | Days, parallelise across partitions | Per-partition fit | Partition by tenant or hash; shard across nodes with Citus; consider Qdrant / Milvus. |
Limits and quotas#
pgvector enforces some hard ceilings, mostly inherited from Postgres internals. Knowing them up front avoids painful migrations later.
| Limit | Value | Source | How to work around |
|---|---|---|---|
| Max stored vector dimensions | 16,000 | pgvector | Use Matryoshka truncation; very few embedding models exceed 4,096 dims anyway. |
| Max indexed dimensions (HNSW/IVFFlat, vector) | 2,000 | pgvector index code | Use halfvec for up to 4,000 dim indexed; truncate embeddings; multi-index split. |
| Max indexed dimensions (halfvec) | 4,000 | pgvector index code | Sufficient for OpenAI text-embedding-3-large at 3,072 dims. |
| Max non-zero entries (sparsevec indexed) | 1,000 | pgvector | Limits SPLADE-style sparse retrievers; usually adequate. |
| Max rows per table | ~32 TiB heap, ~4 B tuples | Postgres | Partition or shard before reaching either limit. |
| Single-connection concurrency | 1 query at a time | Postgres process model | Use PgBouncer transaction pooling; size pool to (CPU cores x 2). |
| Parallel HNSW build workers | max_parallel_maintenance_workers | Postgres GUC | Raise to (cores - 2); also raise maintenance_work_mem accordingly. |
| Maintenance work mem for build | default 64 MB | Postgres GUC | Raise to 4-32 GB during index build; revert after. |
| Cross-shard / cross-database vector search | Not supported natively | Postgres | Use Citus, postgres_fdw federation, or push aggregation to application. |
| Concurrent index build (CONCURRENTLY) | Supported for HNSW since 0.6 | pgvector | Use CREATE INDEX CONCURRENTLY for zero-downtime builds at the cost of slower build. |
The 2,000-dim cap on indexed `vector` columns rules out indexing 3,072-dim text-embedding-3-large without halfvec, and is the most common reason teams discover 0.7's halfvec type the hard way. Plan for halfvec from day one if you are using OpenAI's large embedder or any 3,072+ dim model.
Observability#
pgvector inherits Postgres's deep observability surface — pg_stat_statements for per-query timing, pg_stat_user_indexes for index hit rates, pg_buffercache for shared-buffers analysis, auto_explain for slow queries — and exposes one extra GUC family (`hnsw.*`, `ivfflat.*`) worth monitoring. The metrics worth alerting on are the ones that distinguish a healthy ANN workload from a misconfigured one: index hit rate, sequential scans on vector tables, p95 latency, and HNSW iterative-scan invocations under filter pressure.
- pg_stat_user_indexes.idx_scan on the HNSW index — should grow with query traffic. Flat or near-zero means the planner is falling back to sequential scan; check operator-class match.
- pg_stat_user_tables.seq_scan on the vector table — should be near zero in steady state. A spike correlates with a query that uses an unindexed distance operator or no ORDER BY.
- pg_stat_statements.mean_exec_time for vector queries — alert if p95 climbs past your SLO; usually points to ef_search too high or RAM pressure forcing page faults.
- shared_buffers hit ratio on the index pages — should be >99% for hot workloads. Lower means the working set has outgrown shared_buffers or OS cache.
- Connection saturation — pg_stat_activity.count vs max_connections. Add PgBouncer if you regularly exceed 50% of max_connections.
- Index bloat — pgvector indices accumulate dead tuples on UPDATE/DELETE-heavy tables. REINDEX CONCURRENTLY periodically (monthly for high-churn corpora).
- Replication lag (pg_stat_replication) — vector index build is replicated to standbys; sustained lag during large builds is normal but bears watching for read-after-write workloads.
-- 1. Per-query slow-query analysis via pg_stat_statements.
SELECT round(mean_exec_time::numeric, 2) AS mean_ms,
calls,
round(stddev_exec_time::numeric, 2) AS stddev_ms,
query
FROM pg_stat_statements
WHERE query ILIKE '%<=>%' OR query ILIKE '%<->%'
ORDER BY mean_exec_time DESC
LIMIT 20;
-- 2. Confirm the HNSW index is being used.
EXPLAIN (ANALYZE, BUFFERS)
SELECT id FROM documents
WHERE tenant_id = '...'
ORDER BY embedding <=> '[...]'::vector
LIMIT 10;
-- Look for: "Index Scan using documents_embedding_hnsw_idx"
-- Not: "Seq Scan on documents"
-- 3. Index hit rate.
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE indexrelname LIKE '%embedding%';
-- 4. Index bloat estimate (after long-running churn).
SELECT pg_size_pretty(pg_relation_size('documents_embedding_hnsw_idx')) AS idx_size,
pg_size_pretty(pg_relation_size('documents')) AS heap_size;Cost and FinOps#
pgvector's cost economics are inherited from Postgres. Self-hosted on Yobitel, AWS, GCP or Azure, you pay for compute, memory, NVMe storage and outbound network. Managed Postgres services (RDS, Aurora, Cloud SQL, Supabase, Neon) charge per instance-hour and storage-GB-month; the pgvector extension itself is free. The table below uses representative June 2026 list pricing for a RAG-shaped workload (1024-dim vectors, HNSW indices, mixed read/write at moderate QPS) — substitute your own committed-use rates when planning.
- FOCUS-conformant billing exports tag each Postgres instance by tenant and environment, so per-corpus cost-per-million-queries is trivial to compute.
- Storage cost typically dwarfs compute cost above 100 M vectors — halfvec, sparsevec or binary quantisation are the highest-leverage cost levers.
- Read replicas scale read QPS linearly; vector index queries are pure reads and benefit fully. Writes still serialise to the primary.
- Multi-AZ deployments roughly double the instance cost but are non-negotiable for production. Backups are an additional ~5-15% on top.
- Reserved instances or savings plans on AWS / GCP / Azure cut instance cost 30-60% on stable workloads.
| Posture | Corpus | Recommended instance | Monthly cost (USD) | Notes |
|---|---|---|---|---|
| Dev / prototype | 100 K vectors | Local Docker or t3.medium | $0-30 | Single-instance, no replica. Fits in laptop RAM. |
| Small production | 1 M vectors | RDS db.m6g.xlarge (4 vCPU, 16 GB) | $280-450 | Multi-AZ, daily backups. Comfortable for early-stage RAG. |
| Mid-size production | 10 M vectors | RDS db.r6g.4xlarge (16 vCPU, 128 GB) | $1,800-2,800 | shared_buffers 32 GB. Add read replica for QPS scale. |
| Large production | 100 M vectors | RDS db.r6g.16xlarge (64 vCPU, 512 GB) | $8,500-12,000 | Or Aurora Postgres for IO-heavy ingest. Consider partitioning by tenant. |
| Very large / sharded | 1 B vectors | Citus on 4-8 nodes (r6g.8xlarge each) | $18,000-32,000 | Hash-partition by tenant_id. Dedicated vector DB (Milvus, Qdrant) becomes competitive at this scale. |
| Supabase (managed) | Up to ~10 M vectors | Pro tier + add-on compute | $25-1,000 | Cheapest fully-managed start; pricing scales with compute add-ons. |
| Neon (serverless) | Variable | Pay per CPU-second and storage | $0-2,000 | Scales to zero between queries; good for spiky workloads. |
| Tembo Cloud (vector stack preset) | Up to 100 M vectors | Per-instance vector preset | $200-5,000 | Pre-tuned pgvector with parallel build and monitoring. |
Security and compliance#
pgvector inherits the full Postgres security model — role-based access control with GRANT and REVOKE, row-level security (RLS) for per-row authorisation, SSL/TLS for client connections, SCRAM-SHA-256 authentication, encryption-at-rest via pg_crypto or filesystem-level (LUKS, AWS EBS encryption, GCP CSEK). For multi-tenant RAG, row-level security is the right pattern — define an RLS policy that restricts each tenant to its own rows, and the vector search inherits the restriction automatically.
Audit and accountability use Postgres's pg_audit extension plus the application-layer query logger. Every vector search executes as a parameterised SQL query, so per-tenant, per-user query attribution is straightforward. The embedding column is not personal data on its own but can be a fingerprint of the source document; treat it with the same confidentiality posture as the source.
For UK public-sector deployments, pgvector inside Postgres satisfies NCSC Cloud Security Principles when the underlying Postgres deployment does — there is nothing pgvector-specific that an NCSC-compliant Postgres would not already cover. G-Cloud 14 lot 1 (Cloud Hosting) and lot 2 (Cloud Software) listings for Postgres-as-a-Service include the extension. For EU GDPR, the right-to-erasure obligation requires DELETE-then-VACUUM rather than soft delete; vector indices reclaim space lazily, so monitor index size after high-erasure events. For US HIPAA, run inside a BAA-covered Postgres service (RDS, Cloud SQL, Azure Database) and enable encryption at rest; pgvector does not change the BAA boundary.
Row-level security on the vector table is more reliable than application-layer tenant scoping. Set `USING (tenant_id = current_setting('app.tenant_id')::uuid)` and have the application set the GUC at the start of each connection. The vector search inherits the policy automatically and no SQL bug at the application layer can leak across tenants.
Migration and alternatives#
Most migrations to pgvector come from one of three origins: a dedicated vector database (Pinecone, Weaviate, Qdrant, Milvus), an in-process library (FAISS, Annoy, ScaNN), or a managed embedding service (OpenAI's deprecated 'Files' API, Cohere, Voyage). Migrations away from pgvector almost always go in one direction — to Qdrant or Milvus when scale outgrows what Postgres can serve on a single node.
From FAISS or other in-process libraries: pgvector gives up some raw speed (FAISS GPU at billion-scale is faster) and gains transactional ACID guarantees, SQL joins onto metadata, point-in-time recovery, replication, and the ability to update vectors without rebuilding the entire index. The migration is one ingest pipeline — read vectors from the FAISS dump, COPY into Postgres, build the HNSW index. Most teams complete it in a day.
From Pinecone or Weaviate (managed): pgvector gives up operational simplicity (Pinecone serverless requires no instance sizing) and gains data residency control, no per-query pricing, and unification with the rest of the application data. The migration is an export-and-import — Pinecone's `describe_index_stats` plus `fetch` reads the corpus; bulk insert into pgvector with HNSW build. For corpora under 100 M vectors the cost crossover is usually decisive: a single mid-size RDS instance hosts what would cost thousands of dollars per month on Pinecone p-tier.
From Qdrant / Milvus / Weaviate (self-hosted): straightforward export-import. The biggest gain is reducing operational footprint by one database; the biggest loss is the specialised filter and payload-search optimisations the dedicated engines spent years building. For workloads dominated by selective metadata filters and complex payload schemas, the dedicated engines often remain the better fit.
Migrations away from pgvector go to Qdrant or Milvus when the corpus exceeds what a single Postgres node can serve, or when filter-heavy query patterns hit performance walls Postgres cannot work around. The Citus extension lets pgvector shard across multiple Postgres nodes, which delays this decision substantially but does not eliminate it.
| From | Migration effort | Cost change | Operational change |
|---|---|---|---|
| FAISS in-process | Low — one ingest job | Slight increase (Postgres infra) | Gain ACID, replication, SQL joins, transactional updates. |
| Pinecone (serverless) | Low — export + bulk insert | Often 50-90% cheaper at <100 M vectors | Lose serverless auto-scale; gain data residency, no per-query fee. |
| Pinecone (pod-based) | Low — export + bulk insert | Usually cheaper | Gain SQL joins and unified ops; lose pod-isolation guarantees. |
| Weaviate (managed or self) | Medium — schema mapping | Cheaper if you already run Postgres | Lose modules ecosystem; gain Postgres operational story. |
| Qdrant (self-hosted) | Medium — schema mapping | Cheaper if you already run Postgres | Lose payload-search optimisation; gain SQL surface. |
| Milvus (self-hosted) | Medium-high — different model | Often cheaper at <500 M vectors | Lose distributed engine; gain operational simplicity. |
| pgvector to Milvus/Qdrant | Medium — export + reload | Higher infra cost | Gain billion-scale headroom and specialised filter perf. |
Troubleshooting#
The error and symptom table below covers the failure modes that account for the majority of pgvector production incidents observed across Yobitel-operated fleets and open-source community reports. Each row maps the observable symptom to the underlying mechanism and the minimum-viable fix.
| Symptom | Cause | Fix |
|---|---|---|
| ERROR: type vector does not exist | Extension not installed in this database. | CREATE EXTENSION vector; — note: per-database, not per-server. |
| ERROR: column cannot have more than 2000 dimensions | Trying to index a vector column larger than 2,000 dim. | Use halfvec(N) up to 4,000; truncate via Matryoshka; or split into multiple indexed columns. |
| Query is slow — Seq Scan in EXPLAIN | Operator class on index does not match ORDER BY operator. | Verify CREATE INDEX used vector_cosine_ops when ORDER BY uses <=>. |
| Recall is poor — top results are wrong | hnsw.ef_search too low. | SET hnsw.ef_search = 100 (or higher); rebuild benchmark. |
| Latency p95 spikes after corpus grew | Working set no longer fits shared_buffers + OS cache. | Raise instance RAM; raise shared_buffers; consider halfvec to shrink footprint. |
| Index build takes forever | maintenance_work_mem too small or no parallel workers. | SET maintenance_work_mem = '16GB'; SET max_parallel_maintenance_workers = 8; rebuild. |
| Filter queries return fewer rows than LIMIT | HNSW returns LIMIT candidates, filter eliminates them. | On 0.8+, SET hnsw.iterative_scan = strict_order. On older, raise hnsw.ef_search. |
| DELETE leaves index bloated | HNSW does not reclaim graph nodes on delete. | REINDEX CONCURRENTLY the index after large erasures; schedule monthly for high-churn. |
| NCCL-like error on multi-node Citus deployment | Vector index not co-located with table shard. | Define distribution column on table; ensure index includes it in the operator class. |
| Cosine results all near 1.0 | Embeddings not normalised — cosine on unnormalised vectors meaningless. | Normalise at insert (vector / vector_norm(vector)) or use inner-product operator instead. |
| Cannot CREATE EXTENSION on managed Postgres | Extension not allow-listed by the provider. | RDS / Aurora / Cloud SQL / Azure all support pgvector — confirm Postgres version supports it (13+). |
| Postgres OOM during HNSW build | maintenance_work_mem too large for available RAM. | Lower maintenance_work_mem; reduce parallel workers; build incrementally with partitioning. |
| Wrong dimension on insert | Mismatch between column definition and inserted vector length. | VECTOR types are strongly typed by dimension; cast or update column to match embedding model. |
Where pgvector fits in the Yobitel stack#
pgvector is the default vector store for Yobitel's first-party AI applications when the corpus fits comfortably on a single Postgres node — which, in practice, covers the substantial majority of vertical-app deployments. MediQuery's customer-specific knowledge bases (clinical protocols, formulary, internal guidelines) typically live in pgvector inside the application's managed Postgres, with hybrid retrieval via ParadeDB BM25 plus cosine, row-level security per clinical site, and full transactional consistency between the embedding store and the rest of the application schema.
For Yobibyte customers building their own RAG applications, the managed Postgres service inside the platform ships with pgvector 0.8 pre-installed, sensible defaults for `shared_buffers` and `maintenance_work_mem`, parallel index build configured, and Prometheus exporters for the metrics listed in the observability section. Customers see SQL and standard Postgres tooling; the operational tuning is handled.
InferenceBench publishes pgvector benchmarks on standard hardware tiers — single-node throughput at 1 M / 10 M / 100 M corpora, hybrid-query latency, index-build time as a function of `maintenance_work_mem` and worker count, halfvec versus vector recall trade-offs. The data lets customers size pgvector capacity against realistic load before committing, and reproduces the pgvector-versus-Qdrant-versus-Milvus comparison on identical hardware so the dedicated-engine decision is empirical rather than vendor-marketing-driven.
References
- pgvector on GitHub · GitHub
- pgvector 0.5.0 — HNSW indexing release notes · GitHub Releases
- Supabase pgvector guide · Supabase Docs
- AWS RDS pgvector announcement · AWS
- ParadeDB pg_search — Lucene-style BM25 on Postgres · GitHub
- Efficient and robust approximate nearest neighbor search using HNSW graphs (Malkov and Yashunin, 2016) · arXiv