tech

April 24, 2026

JinaVDR: New Visual Document Retrieval Benchmark with 95 Tasks in 20 Languages

We're releasing JinaVDR (Visual Document Retrieval), a new benchmark for evaluating how well models retrieve visually complex documents. JinaVDR encompasses multilingual documents with intricate layouts—combining graphs, charts, tables, text, and images alongside scanned copies and screenshots. The benchmark pairs these diverse visual documents with targeted text queries, enabling comprehensive evaluation of retrieval performance across real-world document complexity and broader domain coverage.

JinaVDR: New Visual Document Retrieval Benchmark with 95 Tasks in 20 Languages

TL;DR

  • JinaVDR is a new benchmark for visual document retrieval, covering 95 tasks across 20 languages.
  • It evaluates models on visually complex and multilingual documents with intricate layouts like graphs, charts, and tables.
  • The benchmark incorporates diverse domains such as historic documents, legal texts, and scientific papers.
  • JinaVDR was constructed by repurposing existing datasets, manual annotation, synthetic generation, and repurposing crawled datasets.
  • Existing benchmarks like MTEB are primarily text-based, while ViDoRe and MIEB have limitations in language diversity and document complexity.
  • Benchmarking results indicate that many recent embedding models struggle with JinaVDR's tasks, with Jina-embeddings-v4 showing superior performance due to its multi-vector capability.
  • JinaVDR is being integrated into the MTEB framework to increase adoption and ease of use.
  • Limitations include size normalization by subsampling datasets and quality filtering for practical usability and evaluation quality.

Continue reading the original article

Made withNostr