pgvector

TL;DR

pgvector is an open-source PostgreSQL extension by Andrew Kane (`ankane`), first released April 2021 under the PostgreSQL License (a permissive BSD-style licence). The 0.8 series (2026) ships HNSW and IVFFlat indices, six distance operators, and four vector types (vector, halfvec, bit, sparsevec).
Adds a `vector(N)` column type plus the distance operators `<->` (L2), `<#>` (negative inner product), `<=>` (cosine), `<+>` (L1), and Hamming / Jaccard for bit vectors. ORDER BY any operator triggers an index scan; without the operator the planner falls back to sequential.
Available out of the box on every major managed Postgres — Amazon RDS, Aurora, Cloud SQL, Azure Database for PostgreSQL, Supabase, Neon, Crunchy Data, Tembo, Timescale. Self-hosted install is `CREATE EXTENSION vector;` against any Postgres 13+.
The pitch is operational: if you already run Postgres, you do not need to add a second database for vectors. Embeddings, metadata, ACLs and application data live in the same tables with transactional consistency, the same backup and replication story, the same monitoring, the same access control model.
Trade-off versus dedicated vector databases — slower to build large indices (parallel build helps since 0.6 but still trails Qdrant or Milvus by 2-5x), less specialised filter optimisation at extreme scale (above ~500M vectors per node), no built-in vector-aware sharding. Most teams under hundreds of millions of vectors never hit those limits.

Overview

pgvector is an open-source PostgreSQL extension that adds vector data types, distance operators and approximate-nearest-neighbour (ANN) indices to Postgres. It was created by Andrew Kane (the author of much-loved Postgres extensions like pghero and pgsync) and first released in April 2021. Five years later it is the most-installed vector store in production by a wide margin, almost entirely because it removes the operational question 'should we add a vector database?' from the system-design conversation. The answer for teams already on Postgres is now usually 'no — use the database you already have'.

The 2023 release of HNSW indexing in pgvector 0.5 was the inflection point. Until then, IVFFlat was the only available index and performance lagged dedicated vector engines noticeably. HNSW closed most of that gap, and the 0.6 (parallel index build), 0.7 (halfvec, bit, sparsevec types) and 0.8 (iterative scan with metadata filters, faster bit-vector ops) releases since have made pgvector competitive on every operational dimension that matters for under-500M-vector workloads. Above that scale, dedicated engines (Qdrant, Milvus, Vespa) still win on raw throughput and specialised filter performance.

pgvector's reach is enormous. Every major managed Postgres ships it: Amazon RDS and Aurora (since May 2023), Google Cloud SQL and AlloyDB, Azure Database for PostgreSQL, Supabase (where it is the default vector store and the foundation of Supabase Vector), Neon, Crunchy Data, Tembo, Timescale, DigitalOcean Managed Databases. Self-hosted installs run on any Postgres 13+ with the development headers; the source builds in seconds. The footprint is so broad that almost every Postgres user has it available without adding a single dependency.

This entry is the operational reference for engineers building production systems on pgvector — every SQL clause, every index option, every distance operator, sizing arithmetic at 1M / 10M / 100M / 1B rows, workload patterns for RAG, recommendation and image search, observability hooks, security posture, and migration trade-offs versus FAISS, Pinecone and the open-source dedicated vector engines. Customers running pgvector on a Yobitel NeoCloud-hosted Postgres get the operational benefit of in-database vector search without standing up a separate vector store, and inherit NeoCloud's UK / EU regional posture (NCSC OFFICIAL-aligned) for the same data. This entry helps you decide if pgvector fits your similarity-search needs versus a dedicated vector DB, and if it does, how to size and operate it.

Quick start

The session below installs pgvector on a Postgres 16 instance, creates a documents table with a 1024-dimensional embedding column, builds an HNSW index for cosine distance, inserts a few rows of embeddings, and runs a similarity search filtered by tenant. It runs today against any Postgres that has the extension available — locally with docker run pgvector/pgvector:pg16, or on any managed offering.

-- 1. Enable the extension (one-time, requires the postgres role).
CREATE EXTENSION IF NOT EXISTS vector;

-- 2. Schema for a RAG-style corpus.
CREATE TABLE documents (
    id            bigserial PRIMARY KEY,
    tenant_id     uuid       NOT NULL,
    source_uri    text       NOT NULL,
    chunk_offset  int        NOT NULL,
    chunk_text    text       NOT NULL,
    embedding     vector(1024) NOT NULL,
    created_at    timestamptz NOT NULL DEFAULT now()
);

CREATE INDEX documents_tenant_idx ON documents (tenant_id);

-- 3. Build an HNSW index for cosine distance.
-- m and ef_construction trade index build cost for query recall.
CREATE INDEX documents_embedding_hnsw_idx
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 200);

-- 4. Insert a few rows (in practice, batched COPY from your ingest pipeline).
INSERT INTO documents (tenant_id, source_uri, chunk_offset, chunk_text, embedding) VALUES
  ('00000000-0000-0000-0000-000000000001', 's3://docs/policy.md', 0,
   'Employees are entitled to 25 days of annual leave plus statutory bank holidays.',
   '[0.012, -0.034, ...]'::vector),
  ('00000000-0000-0000-0000-000000000001', 's3://docs/policy.md', 1,
   'Overtime above 40 hours per week is paid at 1.5x the base hourly rate.',
   '[0.057, 0.011, ...]'::vector);

-- 5. Query: top-5 chunks by cosine distance, scoped to a tenant.
SET hnsw.ef_search = 100;   -- session-scoped, trades latency for recall

SELECT id, source_uri, chunk_text, embedding <=> $1 AS distance
FROM documents
WHERE tenant_id = $2
ORDER BY embedding <=> $1
LIMIT 5;

Tip: The placeholder $1 in the ORDER BY is the query embedding (a vector(1024) literal or bind parameter). Always parameterise — concatenating embedding literals into SQL blows out the parser and leaks vectors to query logs.

How it works

pgvector is a standard Postgres extension installed via CREATE EXTENSION vector;. It adds new types, operators, operator classes, GUC variables and access methods to the database; everything else (transactions, replication, backups, ACLs, monitoring, foreign data wrappers) is inherited from Postgres itself. Vectors are stored inline in the row tuple like any other Postgres value, with TOAST handling for rows that exceed the page size — so a 1536-dimensional float32 vector (6,144 bytes) lives compressed and out-of-line in the TOAST table.

Distance operators are infix functions that the planner recognises as eligible for index scans. <-> returns L2 distance, <#> returns negative inner product (negated so smaller is closer, matching the L2 convention), <=> returns cosine distance (1 minus cosine similarity), <+> returns L1 (Manhattan) distance. For bit and sparsevec types, <~> is Hamming and <%> is Jaccard. The planner uses the index only when the ORDER BY exactly matches the index's operator class — ORDER BY embedding <=> $1 uses a cosine index, but ORDER BY embedding <-> $1 does not, even on the same column. Each index is bound to one distance metric.

Two index types ship in 0.8. HNSW (since 0.5, July 2023) is the recommended default. It builds the same Hierarchical Navigable Small World graph as the algorithm's reference implementation (Malkov and Yashunin, 2016) — see hnsw-algorithm. Build parameters m (max neighbours per node, default 16, typical production value 24-48) and ef_construction (build candidate pool, default 64, typical 100-400) determine index quality and build cost. Query-time parameter hnsw.ef_search (default 40) controls the candidate pool size during query and is set as a session GUC; raising it improves recall at linearly higher latency.

IVFFlat (since 0.1) is the older option, still useful for write-heavy workloads where index build cost dominates. It clusters vectors into lists Voronoi cells with k-means at build time, and the query-time parameter ivfflat.probes (default 1, typical 10-50) controls how many cells are searched. Recall and latency both scale roughly with probes / lists. IVFFlat builds faster than HNSW and supports streaming updates without the graph-update overhead, but its recall ceiling is lower at any given latency. The pgvector documentation now recommends HNSW for most workloads and IVFFlat only when build cost is the binding constraint.

From 0.7 onward, pgvector supports three additional vector types beyond the original vector (float32). halfvec(N) stores float16, halving memory at modest recall cost. bit(N) stores binary vectors for Hamming distance — useful with binary-quantised embeddings (Cohere's binary embeddings, JinaAI binary). sparsevec(N) stores sparse vectors with explicit (index, value) tuples for terms — the storage backing SPLADE-style learned sparse retrievers inside Postgres. The 0.8 release added iterative scan, which lets the planner work through the HNSW graph in batches when a WHERE clause filters out many candidates, dramatically improving performance on selective-filter queries.

Vector storage: inline in row, TOAST for large vectors. No separate vector store — vectors live in the same heap as the rest of the row's columns.
Distance operators: <-> L2, <#> negative inner product, <=> cosine, <+> L1 for float/half vectors; <~> Hamming, <%> Jaccard for bit/sparse.
Index types: HNSW (recommended) and IVFFlat. Each index binds to one distance metric via its operator class.
GUC variables: hnsw.ef_search, hnsw.iterative_scan, ivfflat.probes, ivfflat.iterative_scan — session- or query-scoped, no restart needed.
Parallel build: HNSW supports parallel workers via max_parallel_maintenance_workers since 0.6; typical 2-4x speedup on multi-core machines.

Reference

Canonical reference for every SQL clause and operator pgvector adds to Postgres, accurate to the 0.8 series (June 2026). Anything not listed here is documented in the upstream README; everything listed here is operationally relevant in production deployments.

Construct	Form	Purpose	Notes
Extension install	CREATE EXTENSION vector;	Activates the extension in the current database.	One-time. Requires CREATEROLE or superuser unless preinstalled.
vector type	vector(N)	Fixed-dim float32 vector column.	Max 16,000 dim stored; max 2,000 dim for HNSW/IVFFlat indices.
halfvec type	halfvec(N)	Float16 vector — half the storage of vector.	0.7+. Up to 4,000 dim indexed.
bit type	bit(N)	Binary vector for Hamming distance.	Postgres built-in type, indexed by pgvector ops.
sparsevec type	sparsevec(N)	Sparse vector with (index,value) tuples.	0.7+. Up to 1,000 non-zero indexed.
L2 operator	<->	Euclidean (L2) distance.	Smaller is closer.
Inner product operator	<#>	Negative inner product.	Negated so smaller is closer — matches ORDER BY ASC convention.
Cosine operator	<=>	Cosine distance (1 - cosine similarity).	Most common for normalised embeddings.
L1 operator	<+>	Manhattan (L1) distance.	0.7+.
Hamming operator	<~>	Hamming distance for bit vectors.	0.7+.
Jaccard operator	<%>	Jaccard distance for bit vectors.	0.7+.
HNSW index	CREATE INDEX ON t USING hnsw (col vector_cosine_ops) WITH (m=24, ef_construction=200);	Build HNSW graph for cosine distance.	Operator classes: vector_l2_ops, vector_ip_ops, vector_cosine_ops, vector_l1_ops, halfvec_*_ops, bit_hamming_ops, etc.
IVFFlat index	CREATE INDEX ON t USING ivfflat (col vector_l2_ops) WITH (lists=1000);	Build IVF index with k-means clusters.	Set lists to rows/1000 (≤1M) or sqrt(rows) (>1M).
HNSW build options	m (default 16), ef_construction (64)	Graph connectivity and build candidate pool.	Production: m=24-48, ef_construction=100-400.
IVFFlat build options	lists (default 100)	Number of Voronoi cells.	Train on a sample if dataset is huge.
HNSW query GUC	SET hnsw.ef_search = 100;	Candidate pool size at query time.	Raise for recall, lower for latency. Default 40.
IVFFlat query GUC	SET ivfflat.probes = 10;	Number of cells searched per query.	Raise for recall, lower for latency. Default 1.
Iterative scan	SET hnsw.iterative_scan = strict_order;	Continue searching past LIMIT under filter pressure.	0.8+. Use with selective WHERE clauses.
Distance functions	l2_distance(a,b), inner_product(a,b), cosine_distance(a,b)	Function form of the operators.	Useful inside aggregates, CTEs.
Vector aggregates	AVG(embedding), SUM(embedding)	Centroid / sum across rows.	0.5+. Used in clustering pipelines.
Vector arithmetic	embedding + other_embedding, embedding * 2.0	Element-wise add, subtract, scalar multiply.	Useful for query rewriting (a + b - c style).
Vector helpers	vector_dims(v), vector_norm(v)	Dimension count, L2 norm.	Useful for sanity checks and normalisation.
Subvector	subvector(v, start, count)	Slice a vector — supports Matryoshka truncation.	0.7+.
Binary quantise	binary_quantize(v)	Convert float vector to bit vector by sign.	0.7+. For binary-vector hybrid pipelines.

Note: Operator class mismatch is the most common pgvector pitfall. A HNSW index built with vector_cosine_ops accelerates ORDER BY <=> but not <->. The planner silently falls back to sequential scan; the query just gets slow. Always EXPLAIN ANALYZE the query and confirm the index appears in the plan.

Workload patterns

Three workload shapes account for the bulk of pgvector deployments: RAG inside an existing Postgres-backed application; hybrid recommendation systems where vector similarity is one signal among many in a SQL query; and image / multimodal similarity search backing product discovery or content moderation. Each pattern uses pgvector differently.

Pattern A — RAG inside an existing Postgres app. The corpus chunks, embeddings and per-tenant ACLs all live in Postgres. Hybrid retrieval combines a pgvector cosine search with Postgres full-text search (tsvector / tsquery) or ParadeDB's Lucene-based BM25, fused with Reciprocal Rank Fusion in a CTE. Critical operational decision: enforce tenant scoping at the SQL layer (WHERE tenant_id = $1), not at the application layer.

Pattern B — Recommendation with hybrid SQL filter plus vector similarity. The classic query is 'find products similar to this one, in stock, in the user's region, under $100, excluding what they've already bought'. The non-vector predicates are highly selective; the vector ranking is the tiebreaker on the surviving rows. pgvector 0.8's iterative scan dramatically improves this pattern by continuing past LIMIT until the WHERE clause is satisfied.

Pattern C — Image and multimodal similarity. Vectors come from CLIP, SigLIP or a domain-specific encoder; embeddings are 512-1024 dim float32. The dominant queries are reverse image search ('find products that look like this uploaded photo') and content moderation ('flag uploads similar to known prohibited content'). Indexing pattern is the same as RAG; the differences are at the application layer (image-encoder service, deduplication on Hamming distance with binary quantisation, large blob storage for the originals).

-- Pattern A: RAG hybrid retrieval (BM25 via ParadeDB + cosine via pgvector + RRF)
-- Requires the pg_search extension from ParadeDB for BM25.
WITH dense AS (
  SELECT id, chunk_text,
         ROW_NUMBER() OVER (ORDER BY embedding <=> $1) AS rank
  FROM documents
  WHERE tenant_id = $2
  ORDER BY embedding <=> $1
  LIMIT 50
),
sparse AS (
  SELECT id, chunk_text,
         ROW_NUMBER() OVER (ORDER BY paradedb.score(id) DESC) AS rank
  FROM documents
  WHERE tenant_id = $2
    AND chunk_text @@@ $3   -- ParadeDB BM25 match operator
  LIMIT 50
)
SELECT id, chunk_text, SUM(rrf) AS score
FROM (
  SELECT id, chunk_text, 1.0 / (60 + rank) AS rrf FROM dense
  UNION ALL
  SELECT id, chunk_text, 1.0 / (60 + rank) AS rrf FROM sparse
) fused
GROUP BY id, chunk_text
ORDER BY score DESC
LIMIT 10;

-- Pattern B: hybrid recommendation with selective filters
SET hnsw.iterative_scan = strict_order;
SELECT p.id, p.title, p.price,
       p.embedding <=> $1 AS distance
FROM products p
WHERE p.region_id   = $2
  AND p.in_stock    = true
  AND p.price       < 100
  AND p.id NOT IN (SELECT product_id FROM purchases WHERE user_id = $3)
ORDER BY p.embedding <=> $1
LIMIT 20;

-- Pattern C: image dedup with binary quantisation for fast Hamming
SELECT id, image_uri,
       binary_quantize(embedding) <~> binary_quantize($1) AS hamming
FROM images
WHERE binary_quantize(embedding) <~> binary_quantize($1) < 32
ORDER BY hamming
LIMIT 50;

Warning: Pattern B's selective filters are where pre-0.8 pgvector struggled — the HNSW search would return LIMIT rows before the filter could be applied, producing too-few results. Iterative scan (hnsw.iterative_scan) fixes this; on 0.7 and earlier, the workaround was raising hnsw.ef_search substantially or using IVFFlat with high probes.

Sizing and capacity planning

pgvector's storage and memory footprint are predictable. Per-row storage for a vector(N) column is approximately 4*N + 8 bytes (4 bytes per float32 plus a small header) plus the heap-row overhead Postgres adds for every tuple. HNSW index overhead is roughly M * 8 bytes of pointer storage per vector per layer the vector appears in, plus the vector itself — practically, a 1024-dimensional float32 corpus with HNSW (m=32) takes about 1.5x the raw vector storage for the index.

Build time and memory at scale follow the sizing table below. Anchors are mid-range observations from maintenance_work_mem = 4GB, four parallel workers, 1024-dimensional vectors, modern x86 (Intel Sapphire Rapids or AMD Genoa) Postgres on local NVMe. Treat as planning guides; benchmark on your actual hardware before committing.

Query latency is dominated by graph traversal cost. On HNSW with m=32 and ef_search=100, a single query against 10 M vectors typically completes in 2-8 ms cold, 0.5-2 ms warm. At 100 M vectors, expect 5-20 ms warm. Filter-heavy queries scale worse — selective WHERE clauses on metadata columns can hurt performance on the older sequential-filter path; iterative scan (0.8+) brings the worst case under control. Concurrency scales with Postgres's connection model — use PgBouncer in transaction-pooling mode for high-QPS workloads, since each Postgres backend is a process and HNSW search holds CPU.

Memory-wise, pgvector benefits dramatically from having both the heap (vectors) and the index pages cached. Aim for shared_buffers of 25-40% of instance RAM and rely on OS page cache for the rest. The HNSW graph is not memory-mappable from cold storage without huge latency penalties (page faults dominate); size RAM to comfortably hold the index for hot workloads. Quantisation helps: halfvec halves footprint at typically <1% recall loss; binary quantisation cuts footprint 32x but is appropriate only for first-stage filtering where a re-rank pass restores precision.

Corpus	Raw vector size	HNSW index size	HNSW build time	RAM during build	Practical posture
1 M vectors	~4 GB	~6 GB	~5-15 minutes	8-16 GB	Fits comfortably on a single small Postgres instance.
10 M vectors	~40 GB	~60 GB	~1-3 hours	32-64 GB	Single instance; raise `maintenance_work_mem` to 16-32 GB.
100 M vectors	~400 GB	~600 GB	~12-30 hours	128-256 GB	Single large instance or partitioned by tenant; parallel build essential.
1 B vectors	~4 TB	~6 TB	Days, parallelise across partitions	Per-partition fit	Partition by tenant or hash; shard across nodes with Citus; consider Qdrant / Milvus.

Limits and quotas

pgvector enforces some hard ceilings, mostly inherited from Postgres internals. Knowing them up front avoids painful migrations later.

Limit	Value	Source	How to work around
Max stored vector dimensions	16,000	pgvector	Use Matryoshka truncation; very few embedding models exceed 4,096 dims anyway.
Max indexed dimensions (HNSW/IVFFlat, vector)	2,000	pgvector index code	Use halfvec for up to 4,000 dim indexed; truncate embeddings; multi-index split.
Max indexed dimensions (halfvec)	4,000	pgvector index code	Sufficient for OpenAI text-embedding-3-large at 3,072 dims.
Max non-zero entries (sparsevec indexed)	1,000	pgvector	Limits SPLADE-style sparse retrievers; usually adequate.
Max rows per table	~32 TiB heap, ~4 B tuples	Postgres	Partition or shard before reaching either limit.
Single-connection concurrency	1 query at a time	Postgres process model	Use PgBouncer transaction pooling; size pool to (CPU cores x 2).
Parallel HNSW build workers	max_parallel_maintenance_workers	Postgres GUC	Raise to (cores - 2); also raise maintenance_work_mem accordingly.
Maintenance work mem for build	default 64 MB	Postgres GUC	Raise to 4-32 GB during index build; revert after.
Cross-shard / cross-database vector search	Not supported natively	Postgres	Use Citus, postgres_fdw federation, or push aggregation to application.
Concurrent index build (CONCURRENTLY)	Supported for HNSW since 0.6	pgvector	Use CREATE INDEX CONCURRENTLY for zero-downtime builds at the cost of slower build.

Warning: The 2,000-dim cap on indexed vector columns rules out indexing 3,072-dim text-embedding-3-large without halfvec, and is the most common reason teams discover 0.7's halfvec type the hard way. Plan for halfvec from day one if you are using OpenAI's large embedder or any 3,072+ dim model.

Observability

pgvector inherits Postgres's deep observability surface — pg_stat_statements for per-query timing, pg_stat_user_indexes for index hit rates, pg_buffercache for shared-buffers analysis, auto_explain for slow queries — and exposes one extra GUC family (hnsw.*, ivfflat.*) worth monitoring. The metrics worth alerting on are the ones that distinguish a healthy ANN workload from a misconfigured one: index hit rate, sequential scans on vector tables, p95 latency, and HNSW iterative-scan invocations under filter pressure.

pg_stat_user_indexes.idx_scan on the HNSW index — should grow with query traffic. Flat or near-zero means the planner is falling back to sequential scan; check operator-class match.
pg_stat_user_tables.seq_scan on the vector table — should be near zero in steady state. A spike correlates with a query that uses an unindexed distance operator or no ORDER BY.
pg_stat_statements.mean_exec_time for vector queries — alert if p95 climbs past your SLO; usually points to ef_search too high or RAM pressure forcing page faults.
shared_buffers hit ratio on the index pages — should be >99% for hot workloads. Lower means the working set has outgrown shared_buffers or OS cache.
Connection saturation — pg_stat_activity.count vs max_connections. Add PgBouncer if you regularly exceed 50% of max_connections.
Index bloat — pgvector indices accumulate dead tuples on UPDATE/DELETE-heavy tables. REINDEX CONCURRENTLY periodically (monthly for high-churn corpora).
Replication lag (pg_stat_replication) — vector index build is replicated to standbys; sustained lag during large builds is normal but bears watching for read-after-write workloads.

-- 1. Per-query slow-query analysis via pg_stat_statements.
SELECT round(mean_exec_time::numeric, 2) AS mean_ms,
       calls,
       round(stddev_exec_time::numeric, 2) AS stddev_ms,
       query
FROM pg_stat_statements
WHERE query ILIKE '%<=>%' OR query ILIKE '%<->%'
ORDER BY mean_exec_time DESC
LIMIT 20;

-- 2. Confirm the HNSW index is being used.
EXPLAIN (ANALYZE, BUFFERS)
SELECT id FROM documents
WHERE tenant_id = '...'
ORDER BY embedding <=> '[...]'::vector
LIMIT 10;
-- Look for: "Index Scan using documents_embedding_hnsw_idx"
-- Not:       "Seq Scan on documents"

-- 3. Index hit rate.
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE indexrelname LIKE '%embedding%';

-- 4. Index bloat estimate (after long-running churn).
SELECT pg_size_pretty(pg_relation_size('documents_embedding_hnsw_idx')) AS idx_size,
       pg_size_pretty(pg_relation_size('documents')) AS heap_size;

Cost and FinOps

pgvector's cost economics are inherited from Postgres. Self-hosted on Yobitel, AWS, GCP or Azure, you pay for compute, memory, NVMe storage and outbound network. Managed Postgres services (RDS, Aurora, Cloud SQL, Supabase, Neon) charge per instance-hour and storage-GB-month; the pgvector extension itself is free. The table below uses representative June 2026 list pricing for a RAG-shaped workload (1024-dim vectors, HNSW indices, mixed read/write at moderate QPS) — substitute your own committed-use rates when planning.

FOCUS-conformant billing exports tag each Postgres instance by tenant and environment, so per-corpus cost-per-million-queries is trivial to compute.
Storage cost typically dwarfs compute cost above 100 M vectors — halfvec, sparsevec or binary quantisation are the highest-leverage cost levers.
Read replicas scale read QPS linearly; vector index queries are pure reads and benefit fully. Writes still serialise to the primary.
Multi-AZ deployments roughly double the instance cost but are non-negotiable for production. Backups are an additional ~5-15% on top.
Reserved instances or savings plans on AWS / GCP / Azure cut instance cost 30-60% on stable workloads.

Posture	Corpus	Recommended instance	Monthly cost (USD)	Notes
Dev / prototype	100 K vectors	Local Docker or t3.medium	$0-30	Single-instance, no replica. Fits in laptop RAM.
Small production	1 M vectors	RDS db.m6g.xlarge (4 vCPU, 16 GB)	$280-450	Multi-AZ, daily backups. Comfortable for early-stage RAG.
Mid-size production	10 M vectors	RDS db.r6g.4xlarge (16 vCPU, 128 GB)	$1,800-2,800	shared_buffers 32 GB. Add read replica for QPS scale.
Large production	100 M vectors	RDS db.r6g.16xlarge (64 vCPU, 512 GB)	$8,500-12,000	Or Aurora Postgres for IO-heavy ingest. Consider partitioning by tenant.
Very large / sharded	1 B vectors	Citus on 4-8 nodes (r6g.8xlarge each)	$18,000-32,000	Hash-partition by tenant_id. Dedicated vector DB (Milvus, Qdrant) becomes competitive at this scale.
Supabase (managed)	Up to ~10 M vectors	Pro tier + add-on compute	$25-1,000	Cheapest fully-managed start; pricing scales with compute add-ons.
Neon (serverless)	Variable	Pay per CPU-second and storage	$0-2,000	Scales to zero between queries; good for spiky workloads.
Tembo Cloud (vector stack preset)	Up to 100 M vectors	Per-instance vector preset	$200-5,000	Pre-tuned pgvector with parallel build and monitoring.

Security and compliance

pgvector inherits the full Postgres security model — role-based access control with GRANT and REVOKE, row-level security (RLS) for per-row authorisation, SSL/TLS for client connections, SCRAM-SHA-256 authentication, encryption-at-rest via pg_crypto or filesystem-level (LUKS, AWS EBS encryption, GCP CSEK). For multi-tenant RAG, row-level security is the right pattern — define an RLS policy that restricts each tenant to its own rows, and the vector search inherits the restriction automatically.

Audit and accountability use Postgres's pg_audit extension plus the application-layer query logger. Every vector search executes as a parameterised SQL query, so per-tenant, per-user query attribution is straightforward. The embedding column is not personal data on its own but can be a fingerprint of the source document; treat it with the same confidentiality posture as the source.

For UK public-sector deployments, pgvector inside Postgres satisfies NCSC Cloud Security Principles when the underlying Postgres deployment does — there is nothing pgvector-specific that an NCSC-compliant Postgres would not already cover. G-Cloud 14 lot 1 (Cloud Hosting) and lot 2 (Cloud Software) listings for Postgres-as-a-Service include the extension. For EU GDPR, the right-to-erasure obligation requires DELETE-then-VACUUM rather than soft delete; vector indices reclaim space lazily, so monitor index size after high-erasure events. For US HIPAA, run inside a BAA-covered Postgres service (RDS, Cloud SQL, Azure Database) and enable encryption at rest; pgvector does not change the BAA boundary.

Tip: Row-level security on the vector table is more reliable than application-layer tenant scoping. Set USING (tenant_id = current_setting('app.tenant_id')::uuid) and have the application set the GUC at the start of each connection. The vector search inherits the policy automatically and no SQL bug at the application layer can leak across tenants.

Migration and alternatives

Most migrations to pgvector come from one of three origins: a dedicated vector database (Pinecone, Weaviate, Qdrant, Milvus), an in-process library (FAISS, Annoy, ScaNN), or a managed embedding service (OpenAI's deprecated 'Files' API, Cohere, Voyage). Migrations away from pgvector almost always go in one direction — to Qdrant or Milvus when scale outgrows what Postgres can serve on a single node.

From FAISS or other in-process libraries: pgvector gives up some raw speed (FAISS GPU at billion-scale is faster) and gains transactional ACID guarantees, SQL joins onto metadata, point-in-time recovery, replication, and the ability to update vectors without rebuilding the entire index. The migration is one ingest pipeline — read vectors from the FAISS dump, COPY into Postgres, build the HNSW index. Most teams complete it in a day.

From Pinecone or Weaviate (managed): pgvector gives up operational simplicity (Pinecone serverless requires no instance sizing) and gains data residency control, no per-query pricing, and unification with the rest of the application data. The migration is an export-and-import — Pinecone's describe_index_stats plus fetch reads the corpus; bulk insert into pgvector with HNSW build. For corpora under 100 M vectors the cost crossover is usually decisive: a single mid-size RDS instance hosts what would cost thousands of dollars per month on Pinecone p-tier.

From Qdrant / Milvus / Weaviate (self-hosted): straightforward export-import. The biggest gain is reducing operational footprint by one database; the biggest loss is the specialised filter and payload-search optimisations the dedicated engines spent years building. For workloads dominated by selective metadata filters and complex payload schemas, the dedicated engines often remain the better fit.

Migrations away from pgvector go to Qdrant or Milvus when the corpus exceeds what a single Postgres node can serve, or when filter-heavy query patterns hit performance walls Postgres cannot work around. The Citus extension lets pgvector shard across multiple Postgres nodes, which delays this decision substantially but does not eliminate it.

From	Migration effort	Cost change	Operational change
FAISS in-process	Low — one ingest job	Slight increase (Postgres infra)	Gain ACID, replication, SQL joins, transactional updates.
Pinecone (serverless)	Low — export + bulk insert	Often 50-90% cheaper at <100 M vectors	Lose serverless auto-scale; gain data residency, no per-query fee.
Pinecone (pod-based)	Low — export + bulk insert	Usually cheaper	Gain SQL joins and unified ops; lose pod-isolation guarantees.
Weaviate (managed or self)	Medium — schema mapping	Cheaper if you already run Postgres	Lose modules ecosystem; gain Postgres operational story.
Qdrant (self-hosted)	Medium — schema mapping	Cheaper if you already run Postgres	Lose payload-search optimisation; gain SQL surface.
Milvus (self-hosted)	Medium-high — different model	Often cheaper at <500 M vectors	Lose distributed engine; gain operational simplicity.
pgvector to Milvus/Qdrant	Medium — export + reload	Higher infra cost	Gain billion-scale headroom and specialised filter perf.

Troubleshooting

The error and symptom table below covers the failure modes that account for the majority of pgvector production incidents observed across Yobitel-operated fleets and open-source community reports. Each row maps the observable symptom to the underlying mechanism and the minimum-viable fix.

Symptom	Cause	Fix
ERROR: type vector does not exist	Extension not installed in this database.	CREATE EXTENSION vector; — note: per-database, not per-server.
ERROR: column cannot have more than 2000 dimensions	Trying to index a vector column larger than 2,000 dim.	Use halfvec(N) up to 4,000; truncate via Matryoshka; or split into multiple indexed columns.
Query is slow — Seq Scan in EXPLAIN	Operator class on index does not match ORDER BY operator.	Verify CREATE INDEX used vector_cosine_ops when ORDER BY uses <=>.
Recall is poor — top results are wrong	hnsw.ef_search too low.	SET hnsw.ef_search = 100 (or higher); rebuild benchmark.
Latency p95 spikes after corpus grew	Working set no longer fits shared_buffers + OS cache.	Raise instance RAM; raise shared_buffers; consider halfvec to shrink footprint.
Index build takes forever	maintenance_work_mem too small or no parallel workers.	SET maintenance_work_mem = '16GB'; SET max_parallel_maintenance_workers = 8; rebuild.
Filter queries return fewer rows than LIMIT	HNSW returns LIMIT candidates, filter eliminates them.	On 0.8+, SET hnsw.iterative_scan = strict_order. On older, raise hnsw.ef_search.
DELETE leaves index bloated	HNSW does not reclaim graph nodes on delete.	REINDEX CONCURRENTLY the index after large erasures; schedule monthly for high-churn.
NCCL-like error on multi-node Citus deployment	Vector index not co-located with table shard.	Define distribution column on table; ensure index includes it in the operator class.
Cosine results all near 1.0	Embeddings not normalised — cosine on unnormalised vectors meaningless.	Normalise at insert (vector / vector_norm(vector)) or use inner-product operator instead.
Cannot CREATE EXTENSION on managed Postgres	Extension not allow-listed by the provider.	RDS / Aurora / Cloud SQL / Azure all support pgvector — confirm Postgres version supports it (13+).
Postgres OOM during HNSW build	maintenance_work_mem too large for available RAM.	Lower maintenance_work_mem; reduce parallel workers; build incrementally with partitioning.
Wrong dimension on insert	Mismatch between column definition and inserted vector length.	VECTOR types are strongly typed by dimension; cast or update column to match embedding model.

Where pgvector fits in the Yobitel stack

pgvector is the default vector store for Yobitel's first-party AI applications when the corpus fits comfortably on a single Postgres node — which, in practice, covers the substantial majority of vertical-app deployments. MediQuery's customer-specific knowledge bases (clinical protocols, formulary, internal guidelines) typically live in pgvector inside the application's managed Postgres, with hybrid retrieval via ParadeDB BM25 plus cosine, row-level security per clinical site, and full transactional consistency between the embedding store and the rest of the application schema.

For Yobibyte customers building their own RAG applications, the managed Postgres service inside the platform ships with pgvector 0.8 pre-installed, sensible defaults for shared_buffers and maintenance_work_mem, parallel index build configured, and Prometheus exporters for the metrics listed in the observability section. Customers see SQL and standard Postgres tooling; the operational tuning is handled.

InferenceBench publishes pgvector benchmarks on standard hardware tiers — single-node throughput at 1 M / 10 M / 100 M corpora, hybrid-query latency, index-build time as a function of maintenance_work_mem and worker count, halfvec versus vector recall trade-offs. The data lets customers size pgvector capacity against realistic load before committing, and reproduces the pgvector-versus-Qdrant-versus-Milvus comparison on identical hardware so the dedicated-engine decision is empirical rather than vendor-marketing-driven.

References

pgvector on GitHub · GitHub
pgvector 0.5.0 — HNSW indexing release notes · GitHub Releases
Supabase pgvector guide · Supabase Docs
AWS RDS pgvector announcement · AWS
ParadeDB pg_search — Lucene-style BM25 on Postgres · GitHub
Efficient and robust approximate nearest neighbor search using HNSW graphs (Malkov and Yashunin, 2016) · arXiv

TL;DR

pgvector is an open-source PostgreSQL extension by Andrew Kane (`ankane`), first released April 2021 under the PostgreSQL License (a permissive BSD-style licence). The 0.8 series (2026) ships HNSW and IVFFlat indices, six distance operators, and four vector types (vector, halfvec, bit, sparsevec).
Adds a `vector(N)` column type plus the distance operators `<->` (L2), `<#>` (negative inner product), `<=>` (cosine), `<+>` (L1), and Hamming / Jaccard for bit vectors. ORDER BY any operator triggers an index scan; without the operator the planner falls back to sequential.
Available out of the box on every major managed Postgres — Amazon RDS, Aurora, Cloud SQL, Azure Database for PostgreSQL, Supabase, Neon, Crunchy Data, Tembo, Timescale. Self-hosted install is `CREATE EXTENSION vector;` against any Postgres 13+.
The pitch is operational: if you already run Postgres, you do not need to add a second database for vectors. Embeddings, metadata, ACLs and application data live in the same tables with transactional consistency, the same backup and replication story, the same monitoring, the same access control model.
Trade-off versus dedicated vector databases — slower to build large indices (parallel build helps since 0.6 but still trails Qdrant or Milvus by 2-5x), less specialised filter optimisation at extreme scale (above ~500M vectors per node), no built-in vector-aware sharding. Most teams under hundreds of millions of vectors never hit those limits.

Overview

Quick start

-- 1. Enable the extension (one-time, requires the postgres role).
CREATE EXTENSION IF NOT EXISTS vector;

-- 2. Schema for a RAG-style corpus.
CREATE TABLE documents (
    id            bigserial PRIMARY KEY,
    tenant_id     uuid       NOT NULL,
    source_uri    text       NOT NULL,
    chunk_offset  int        NOT NULL,
    chunk_text    text       NOT NULL,
    embedding     vector(1024) NOT NULL,
    created_at    timestamptz NOT NULL DEFAULT now()
);

CREATE INDEX documents_tenant_idx ON documents (tenant_id);

-- 3. Build an HNSW index for cosine distance.
-- m and ef_construction trade index build cost for query recall.
CREATE INDEX documents_embedding_hnsw_idx
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 200);

-- 4. Insert a few rows (in practice, batched COPY from your ingest pipeline).
INSERT INTO documents (tenant_id, source_uri, chunk_offset, chunk_text, embedding) VALUES
  ('00000000-0000-0000-0000-000000000001', 's3://docs/policy.md', 0,
   'Employees are entitled to 25 days of annual leave plus statutory bank holidays.',
   '[0.012, -0.034, ...]'::vector),
  ('00000000-0000-0000-0000-000000000001', 's3://docs/policy.md', 1,
   'Overtime above 40 hours per week is paid at 1.5x the base hourly rate.',
   '[0.057, 0.011, ...]'::vector);

-- 5. Query: top-5 chunks by cosine distance, scoped to a tenant.
SET hnsw.ef_search = 100;   -- session-scoped, trades latency for recall

SELECT id, source_uri, chunk_text, embedding <=> $1 AS distance
FROM documents
WHERE tenant_id = $2
ORDER BY embedding <=> $1
LIMIT 5;

Tip: The placeholder $1 in the ORDER BY is the query embedding (a vector(1024) literal or bind parameter). Always parameterise — concatenating embedding literals into SQL blows out the parser and leaks vectors to query logs.

How it works

Vector storage: inline in row, TOAST for large vectors. No separate vector store — vectors live in the same heap as the rest of the row's columns.
Distance operators: <-> L2, <#> negative inner product, <=> cosine, <+> L1 for float/half vectors; <~> Hamming, <%> Jaccard for bit/sparse.
Index types: HNSW (recommended) and IVFFlat. Each index binds to one distance metric via its operator class.
GUC variables: hnsw.ef_search, hnsw.iterative_scan, ivfflat.probes, ivfflat.iterative_scan — session- or query-scoped, no restart needed.
Parallel build: HNSW supports parallel workers via max_parallel_maintenance_workers since 0.6; typical 2-4x speedup on multi-core machines.

Reference

Construct	Form	Purpose	Notes
Extension install	CREATE EXTENSION vector;	Activates the extension in the current database.	One-time. Requires CREATEROLE or superuser unless preinstalled.
vector type	vector(N)	Fixed-dim float32 vector column.	Max 16,000 dim stored; max 2,000 dim for HNSW/IVFFlat indices.
halfvec type	halfvec(N)	Float16 vector — half the storage of vector.	0.7+. Up to 4,000 dim indexed.
bit type	bit(N)	Binary vector for Hamming distance.	Postgres built-in type, indexed by pgvector ops.
sparsevec type	sparsevec(N)	Sparse vector with (index,value) tuples.	0.7+. Up to 1,000 non-zero indexed.
L2 operator	<->	Euclidean (L2) distance.	Smaller is closer.
Inner product operator	<#>	Negative inner product.	Negated so smaller is closer — matches ORDER BY ASC convention.
Cosine operator	<=>	Cosine distance (1 - cosine similarity).	Most common for normalised embeddings.
L1 operator	<+>	Manhattan (L1) distance.	0.7+.
Hamming operator	<~>	Hamming distance for bit vectors.	0.7+.
Jaccard operator	<%>	Jaccard distance for bit vectors.	0.7+.
HNSW index	CREATE INDEX ON t USING hnsw (col vector_cosine_ops) WITH (m=24, ef_construction=200);	Build HNSW graph for cosine distance.	Operator classes: vector_l2_ops, vector_ip_ops, vector_cosine_ops, vector_l1_ops, halfvec_*_ops, bit_hamming_ops, etc.
IVFFlat index	CREATE INDEX ON t USING ivfflat (col vector_l2_ops) WITH (lists=1000);	Build IVF index with k-means clusters.	Set lists to rows/1000 (≤1M) or sqrt(rows) (>1M).
HNSW build options	m (default 16), ef_construction (64)	Graph connectivity and build candidate pool.	Production: m=24-48, ef_construction=100-400.
IVFFlat build options	lists (default 100)	Number of Voronoi cells.	Train on a sample if dataset is huge.
HNSW query GUC	SET hnsw.ef_search = 100;	Candidate pool size at query time.	Raise for recall, lower for latency. Default 40.
IVFFlat query GUC	SET ivfflat.probes = 10;	Number of cells searched per query.	Raise for recall, lower for latency. Default 1.
Iterative scan	SET hnsw.iterative_scan = strict_order;	Continue searching past LIMIT under filter pressure.	0.8+. Use with selective WHERE clauses.
Distance functions	l2_distance(a,b), inner_product(a,b), cosine_distance(a,b)	Function form of the operators.	Useful inside aggregates, CTEs.
Vector aggregates	AVG(embedding), SUM(embedding)	Centroid / sum across rows.	0.5+. Used in clustering pipelines.
Vector arithmetic	embedding + other_embedding, embedding * 2.0	Element-wise add, subtract, scalar multiply.	Useful for query rewriting (a + b - c style).
Vector helpers	vector_dims(v), vector_norm(v)	Dimension count, L2 norm.	Useful for sanity checks and normalisation.
Subvector	subvector(v, start, count)	Slice a vector — supports Matryoshka truncation.	0.7+.
Binary quantise	binary_quantize(v)	Convert float vector to bit vector by sign.	0.7+. For binary-vector hybrid pipelines.

Note: Operator class mismatch is the most common pgvector pitfall. A HNSW index built with vector_cosine_ops accelerates ORDER BY <=> but not <->. The planner silently falls back to sequential scan; the query just gets slow. Always EXPLAIN ANALYZE the query and confirm the index appears in the plan.

Workload patterns

-- Pattern A: RAG hybrid retrieval (BM25 via ParadeDB + cosine via pgvector + RRF)
-- Requires the pg_search extension from ParadeDB for BM25.
WITH dense AS (
  SELECT id, chunk_text,
         ROW_NUMBER() OVER (ORDER BY embedding <=> $1) AS rank
  FROM documents
  WHERE tenant_id = $2
  ORDER BY embedding <=> $1
  LIMIT 50
),
sparse AS (
  SELECT id, chunk_text,
         ROW_NUMBER() OVER (ORDER BY paradedb.score(id) DESC) AS rank
  FROM documents
  WHERE tenant_id = $2
    AND chunk_text @@@ $3   -- ParadeDB BM25 match operator
  LIMIT 50
)
SELECT id, chunk_text, SUM(rrf) AS score
FROM (
  SELECT id, chunk_text, 1.0 / (60 + rank) AS rrf FROM dense
  UNION ALL
  SELECT id, chunk_text, 1.0 / (60 + rank) AS rrf FROM sparse
) fused
GROUP BY id, chunk_text
ORDER BY score DESC
LIMIT 10;

-- Pattern B: hybrid recommendation with selective filters
SET hnsw.iterative_scan = strict_order;
SELECT p.id, p.title, p.price,
       p.embedding <=> $1 AS distance
FROM products p
WHERE p.region_id   = $2
  AND p.in_stock    = true
  AND p.price       < 100
  AND p.id NOT IN (SELECT product_id FROM purchases WHERE user_id = $3)
ORDER BY p.embedding <=> $1
LIMIT 20;

-- Pattern C: image dedup with binary quantisation for fast Hamming
SELECT id, image_uri,
       binary_quantize(embedding) <~> binary_quantize($1) AS hamming
FROM images
WHERE binary_quantize(embedding) <~> binary_quantize($1) < 32
ORDER BY hamming
LIMIT 50;

Warning: Pattern B's selective filters are where pre-0.8 pgvector struggled — the HNSW search would return LIMIT rows before the filter could be applied, producing too-few results. Iterative scan (hnsw.iterative_scan) fixes this; on 0.7 and earlier, the workaround was raising hnsw.ef_search substantially or using IVFFlat with high probes.

Sizing and capacity planning

Corpus	Raw vector size	HNSW index size	HNSW build time	RAM during build	Practical posture
1 M vectors	~4 GB	~6 GB	~5-15 minutes	8-16 GB	Fits comfortably on a single small Postgres instance.
10 M vectors	~40 GB	~60 GB	~1-3 hours	32-64 GB	Single instance; raise `maintenance_work_mem` to 16-32 GB.
100 M vectors	~400 GB	~600 GB	~12-30 hours	128-256 GB	Single large instance or partitioned by tenant; parallel build essential.
1 B vectors	~4 TB	~6 TB	Days, parallelise across partitions	Per-partition fit	Partition by tenant or hash; shard across nodes with Citus; consider Qdrant / Milvus.

Limits and quotas

pgvector enforces some hard ceilings, mostly inherited from Postgres internals. Knowing them up front avoids painful migrations later.

Limit	Value	Source	How to work around
Max stored vector dimensions	16,000	pgvector	Use Matryoshka truncation; very few embedding models exceed 4,096 dims anyway.
Max indexed dimensions (HNSW/IVFFlat, vector)	2,000	pgvector index code	Use halfvec for up to 4,000 dim indexed; truncate embeddings; multi-index split.
Max indexed dimensions (halfvec)	4,000	pgvector index code	Sufficient for OpenAI text-embedding-3-large at 3,072 dims.
Max non-zero entries (sparsevec indexed)	1,000	pgvector	Limits SPLADE-style sparse retrievers; usually adequate.
Max rows per table	~32 TiB heap, ~4 B tuples	Postgres	Partition or shard before reaching either limit.
Single-connection concurrency	1 query at a time	Postgres process model	Use PgBouncer transaction pooling; size pool to (CPU cores x 2).
Parallel HNSW build workers	max_parallel_maintenance_workers	Postgres GUC	Raise to (cores - 2); also raise maintenance_work_mem accordingly.
Maintenance work mem for build	default 64 MB	Postgres GUC	Raise to 4-32 GB during index build; revert after.
Cross-shard / cross-database vector search	Not supported natively	Postgres	Use Citus, postgres_fdw federation, or push aggregation to application.
Concurrent index build (CONCURRENTLY)	Supported for HNSW since 0.6	pgvector	Use CREATE INDEX CONCURRENTLY for zero-downtime builds at the cost of slower build.

Warning: The 2,000-dim cap on indexed vector columns rules out indexing 3,072-dim text-embedding-3-large without halfvec, and is the most common reason teams discover 0.7's halfvec type the hard way. Plan for halfvec from day one if you are using OpenAI's large embedder or any 3,072+ dim model.

Observability

pg_stat_user_indexes.idx_scan on the HNSW index — should grow with query traffic. Flat or near-zero means the planner is falling back to sequential scan; check operator-class match.
pg_stat_user_tables.seq_scan on the vector table — should be near zero in steady state. A spike correlates with a query that uses an unindexed distance operator or no ORDER BY.
pg_stat_statements.mean_exec_time for vector queries — alert if p95 climbs past your SLO; usually points to ef_search too high or RAM pressure forcing page faults.
shared_buffers hit ratio on the index pages — should be >99% for hot workloads. Lower means the working set has outgrown shared_buffers or OS cache.
Connection saturation — pg_stat_activity.count vs max_connections. Add PgBouncer if you regularly exceed 50% of max_connections.
Index bloat — pgvector indices accumulate dead tuples on UPDATE/DELETE-heavy tables. REINDEX CONCURRENTLY periodically (monthly for high-churn corpora).
Replication lag (pg_stat_replication) — vector index build is replicated to standbys; sustained lag during large builds is normal but bears watching for read-after-write workloads.

-- 1. Per-query slow-query analysis via pg_stat_statements.
SELECT round(mean_exec_time::numeric, 2) AS mean_ms,
       calls,
       round(stddev_exec_time::numeric, 2) AS stddev_ms,
       query
FROM pg_stat_statements
WHERE query ILIKE '%<=>%' OR query ILIKE '%<->%'
ORDER BY mean_exec_time DESC
LIMIT 20;

-- 2. Confirm the HNSW index is being used.
EXPLAIN (ANALYZE, BUFFERS)
SELECT id FROM documents
WHERE tenant_id = '...'
ORDER BY embedding <=> '[...]'::vector
LIMIT 10;
-- Look for: "Index Scan using documents_embedding_hnsw_idx"
-- Not:       "Seq Scan on documents"

-- 3. Index hit rate.
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE indexrelname LIKE '%embedding%';

-- 4. Index bloat estimate (after long-running churn).
SELECT pg_size_pretty(pg_relation_size('documents_embedding_hnsw_idx')) AS idx_size,
       pg_size_pretty(pg_relation_size('documents')) AS heap_size;

Cost and FinOps

FOCUS-conformant billing exports tag each Postgres instance by tenant and environment, so per-corpus cost-per-million-queries is trivial to compute.
Storage cost typically dwarfs compute cost above 100 M vectors — halfvec, sparsevec or binary quantisation are the highest-leverage cost levers.
Read replicas scale read QPS linearly; vector index queries are pure reads and benefit fully. Writes still serialise to the primary.
Multi-AZ deployments roughly double the instance cost but are non-negotiable for production. Backups are an additional ~5-15% on top.
Reserved instances or savings plans on AWS / GCP / Azure cut instance cost 30-60% on stable workloads.

Posture	Corpus	Recommended instance	Monthly cost (USD)	Notes
Dev / prototype	100 K vectors	Local Docker or t3.medium	$0-30	Single-instance, no replica. Fits in laptop RAM.
Small production	1 M vectors	RDS db.m6g.xlarge (4 vCPU, 16 GB)	$280-450	Multi-AZ, daily backups. Comfortable for early-stage RAG.
Mid-size production	10 M vectors	RDS db.r6g.4xlarge (16 vCPU, 128 GB)	$1,800-2,800	shared_buffers 32 GB. Add read replica for QPS scale.
Large production	100 M vectors	RDS db.r6g.16xlarge (64 vCPU, 512 GB)	$8,500-12,000	Or Aurora Postgres for IO-heavy ingest. Consider partitioning by tenant.
Very large / sharded	1 B vectors	Citus on 4-8 nodes (r6g.8xlarge each)	$18,000-32,000	Hash-partition by tenant_id. Dedicated vector DB (Milvus, Qdrant) becomes competitive at this scale.
Supabase (managed)	Up to ~10 M vectors	Pro tier + add-on compute	$25-1,000	Cheapest fully-managed start; pricing scales with compute add-ons.
Neon (serverless)	Variable	Pay per CPU-second and storage	$0-2,000	Scales to zero between queries; good for spiky workloads.
Tembo Cloud (vector stack preset)	Up to 100 M vectors	Per-instance vector preset	$200-5,000	Pre-tuned pgvector with parallel build and monitoring.

Security and compliance

Tip: Row-level security on the vector table is more reliable than application-layer tenant scoping. Set USING (tenant_id = current_setting('app.tenant_id')::uuid) and have the application set the GUC at the start of each connection. The vector search inherits the policy automatically and no SQL bug at the application layer can leak across tenants.

Migration and alternatives

From	Migration effort	Cost change	Operational change
FAISS in-process	Low — one ingest job	Slight increase (Postgres infra)	Gain ACID, replication, SQL joins, transactional updates.
Pinecone (serverless)	Low — export + bulk insert	Often 50-90% cheaper at <100 M vectors	Lose serverless auto-scale; gain data residency, no per-query fee.
Pinecone (pod-based)	Low — export + bulk insert	Usually cheaper	Gain SQL joins and unified ops; lose pod-isolation guarantees.
Weaviate (managed or self)	Medium — schema mapping	Cheaper if you already run Postgres	Lose modules ecosystem; gain Postgres operational story.
Qdrant (self-hosted)	Medium — schema mapping	Cheaper if you already run Postgres	Lose payload-search optimisation; gain SQL surface.
Milvus (self-hosted)	Medium-high — different model	Often cheaper at <500 M vectors	Lose distributed engine; gain operational simplicity.
pgvector to Milvus/Qdrant	Medium — export + reload	Higher infra cost	Gain billion-scale headroom and specialised filter perf.

Troubleshooting

Symptom	Cause	Fix
ERROR: type vector does not exist	Extension not installed in this database.	CREATE EXTENSION vector; — note: per-database, not per-server.
ERROR: column cannot have more than 2000 dimensions	Trying to index a vector column larger than 2,000 dim.	Use halfvec(N) up to 4,000; truncate via Matryoshka; or split into multiple indexed columns.
Query is slow — Seq Scan in EXPLAIN	Operator class on index does not match ORDER BY operator.	Verify CREATE INDEX used vector_cosine_ops when ORDER BY uses <=>.
Recall is poor — top results are wrong	hnsw.ef_search too low.	SET hnsw.ef_search = 100 (or higher); rebuild benchmark.
Latency p95 spikes after corpus grew	Working set no longer fits shared_buffers + OS cache.	Raise instance RAM; raise shared_buffers; consider halfvec to shrink footprint.
Index build takes forever	maintenance_work_mem too small or no parallel workers.	SET maintenance_work_mem = '16GB'; SET max_parallel_maintenance_workers = 8; rebuild.
Filter queries return fewer rows than LIMIT	HNSW returns LIMIT candidates, filter eliminates them.	On 0.8+, SET hnsw.iterative_scan = strict_order. On older, raise hnsw.ef_search.
DELETE leaves index bloated	HNSW does not reclaim graph nodes on delete.	REINDEX CONCURRENTLY the index after large erasures; schedule monthly for high-churn.
NCCL-like error on multi-node Citus deployment	Vector index not co-located with table shard.	Define distribution column on table; ensure index includes it in the operator class.
Cosine results all near 1.0	Embeddings not normalised — cosine on unnormalised vectors meaningless.	Normalise at insert (vector / vector_norm(vector)) or use inner-product operator instead.
Cannot CREATE EXTENSION on managed Postgres	Extension not allow-listed by the provider.	RDS / Aurora / Cloud SQL / Azure all support pgvector — confirm Postgres version supports it (13+).
Postgres OOM during HNSW build	maintenance_work_mem too large for available RAM.	Lower maintenance_work_mem; reduce parallel workers; build incrementally with partitioning.
Wrong dimension on insert	Mismatch between column definition and inserted vector length.	VECTOR types are strongly typed by dimension; cast or update column to match embedding model.

Where pgvector fits in the Yobitel stack

References

pgvector on GitHub · GitHub
pgvector 0.5.0 — HNSW indexing release notes · GitHub Releases
Supabase pgvector guide · Supabase Docs
AWS RDS pgvector announcement · AWS
ParadeDB pg_search — Lucene-style BM25 on Postgres · GitHub
Efficient and robust approximate nearest neighbor search using HNSW graphs (Malkov and Yashunin, 2016) · arXiv

pgvector

Overview

Quick start

How it works

Reference

Workload patterns

Sizing and capacity planning

Limits and quotas

Observability

Cost and FinOps

Security and compliance

Migration and alternatives

Troubleshooting

Where pgvector fits in the Yobitel stack

References

Browse all entries

Deploy on Yobibyte

pgvector

Overview

Quick start

How it works

Reference

Workload patterns

Sizing and capacity planning

Limits and quotas

Observability

Cost and FinOps

Security and compliance

Migration and alternatives

Troubleshooting

Where pgvector fits in the Yobitel stack

References

Browse all entries

Deploy on Yobibyte