How We Built a Web-Scale Vector Database for Our Neural Network Search Engine

tech

January 28, 2026

How We Built a Web-Scale Vector Database for Our Neural Network Search Engine

This AI research blog post details the challenges and solutions of creating a web-scale vector database to power our semantic search technology.

TL;DR

Exa built a custom web-scale vector database to handle complex search queries beyond Google's capabilities.
The database stores document embeddings (vectors) to capture semantic meaning, enabling searches based on query embeddings.
Key requirements for the database include searching billions of vectors, efficient metadata filtering, sub-100ms response times, and high query throughput at reasonable cost.
Five core optimizations were implemented to improve memory usage and speed.
Matryoshka embeddings reduce dimensionality (e.g., from 4096 to 256 dimensions), decreasing memory usage by 20x.
Binary quantization further reduces memory by converting 16-bit floats to 1-bit values, an additional 16x saving.
A hybrid search approach uses uncompressed floating-point query embeddings with binary document embeddings and dot product similarity.
Dot product calculations are hyper-optimized using subvectors and precomputed lookup tables, reducing computations by 1/4.
Lookup times are accelerated by loading lookup tables into CPU registers.
Clustering divides documents into groups, enabling searches within relevant clusters for a ~1000x throughput improvement.
Lossy optimizations are compensated by reranking results with uncompressed data.
The database supports complex filters (date range, domain, keyword) using inverted indexes, making it function like a traditional database.
Exa's solution is 10x cheaper than cloud vector database services and offers better sharding capabilities.
The company has developed a custom query language for multi-stage pipeline definitions.

Continue reading the original article