tech

January 28, 2026

How We Built a Web-Scale Vector Database for Our Neural Network Search Engine

This AI research blog post details the challenges and solutions of creating a web-scale vector database to power our semantic search technology.

How We Built a Web-Scale Vector Database for Our Neural Network Search Engine

TL;DR

  • Exa built a custom web-scale vector database to handle complex search queries beyond Google's capabilities.
  • The database stores document embeddings (vectors) to capture semantic meaning, enabling searches based on query embeddings.
  • Key requirements for the database include searching billions of vectors, efficient metadata filtering, sub-100ms response times, and high query throughput at reasonable cost.
  • Five core optimizations were implemented to improve memory usage and speed.
  • Matryoshka embeddings reduce dimensionality (e.g., from 4096 to 256 dimensions), decreasing memory usage by 20x.
  • Binary quantization further reduces memory by converting 16-bit floats to 1-bit values, an additional 16x saving.
  • A hybrid search approach uses uncompressed floating-point query embeddings with binary document embeddings and dot product similarity.
  • Dot product calculations are hyper-optimized using subvectors and precomputed lookup tables, reducing computations by 1/4.
  • Lookup times are accelerated by loading lookup tables into CPU registers.
  • Clustering divides documents into groups, enabling searches within relevant clusters for a ~1000x throughput improvement.
  • Lossy optimizations are compensated by reranking results with uncompressed data.
  • The database supports complex filters (date range, domain, keyword) using inverted indexes, making it function like a traditional database.
  • Exa's solution is 10x cheaper than cloud vector database services and offers better sharding capabilities.
  • The company has developed a custom query language for multi-stage pipeline definitions.