tech

December 17, 2025

Grok-1.5 Vision Preview

Connecting the digital and physical worlds with our first multimodal model.

Grok-1.5 Vision Preview

TL;DR

  • Grok-1.5V is a first-generation multimodal model that processes text and visual information.
  • It can understand documents, diagrams, charts, screenshots, and photographs.
  • Grok-1.5V demonstrates competitive performance against other frontier multimodal models in areas like document understanding and reasoning.
  • It outperforms peers on the new RealWorldQA benchmark for real-world spatial understanding.
  • The RealWorldQA benchmark, consisting of over 700 images, is released to the community under CC BY-ND 4.0.
  • Future developments aim to improve multimodal understanding and generation capabilities across images, audio, and video.

Continue reading
the original article

Made withNostr