tech
January 28, 2026
How We Built a Web-Scale Vector Database for Our Neural Network Search Engine
This AI research blog post details the challenges and solutions of creating a web-scale vector database to power our semantic search technology.

TL;DR
- Exa built a custom web-scale vector database to handle complex search queries beyond Google's capabilities.
- The database stores document embeddings (vectors) to capture semantic meaning, enabling searches based on query embeddings.
- Key requirements for the database include searching billions of vectors, efficient metadata filtering, sub-100ms response times, and high query throughput at reasonable cost.
- Five core optimizations were implemented to improve memory usage and speed.
- Matryoshka embeddings reduce dimensionality (e.g., from 4096 to 256 dimensions), decreasing memory usage by 20x.
- Binary quantization further reduces memory by converting 16-bit floats to 1-bit values, an additional 16x saving.
- A hybrid search approach uses uncompressed floating-point query embeddings with binary document embeddings and dot product similarity.
- Dot product calculations are hyper-optimized using subvectors and precomputed lookup tables, reducing computations by 1/4.
- Lookup times are accelerated by loading lookup tables into CPU registers.
- Clustering divides documents into groups, enabling searches within relevant clusters for a ~1000x throughput improvement.
- Lossy optimizations are compensated by reranking results with uncompressed data.
- The database supports complex filters (date range, domain, keyword) using inverted indexes, making it function like a traditional database.
- Exa's solution is 10x cheaper than cloud vector database services and offers better sharding capabilities.
- The company has developed a custom query language for multi-stage pipeline definitions.