The 4th Perception Test Challenge ECCV 2026

The 4th Perception Test Challenge at ECCV

Do large multimodal models truly understand the spatial structure of the world they perceive? Given that they are largely trained on passive, internet-scale data, how do they represent environments across scales, from the tabletop in a tea preparation video to the city-scale layout of an hour-long walking tour?

Put your model to test and win prizes totalling 20K EUR!

NEW this year: We run a unified videoQA track probing spatial intelligence in table-top and city-scale videos. A single model has to handle both scales to be considered eligible.

Challenge

Speakers

Hugo Spiers
University College London
Laura Leal-Taixé
Technical University Munich
Noah Snavely
Cornell University

Workshop Agenda

Challenge Timeline

Organizers

Viorica Patraucean
Google DeepMind
Joao Carreira
Google DeepMind
Dima Damen
Bristol University
Andrew Zisserman
Oxford University

Previous workshops