The 4th Perception Test Challenge ECCV 2026

The 4th Perception Test Challenge at ECCV

Do large multimodal models truly understand the spatial structure of the world they perceive? Given that they are largely trained on passive, internet-scale data, how do they represent environments across scales, from the tabletop in a tea preparation video to the city-scale layout of an hour-long walking tour?

Put your model to test and win prizes totalling 20K EUR!

NEW this year: KilometerVision track probing spatial intelligence in walking tour videos, including questions on distance estimation, landmark recognition, compass, map understanding.

NEW this year: KilometerAudio track probing multimodal audio-video understanding in hour-long walking tour videos.

Challenges

Speakers

Hugo Spiers
University College London
Laura Leal-Taixé
Technical University Munich
Noah Snavely
Cornell University

Workshop Agenda

Challenge Timeline

Organizers

Viorica Patraucean
Google DeepMind
Fedor Kitashov
Google DeepMind
Joao Carreira
Google DeepMind
Dima Damen
Bristol University
Andrew Zisserman
Oxford University

Previous workshops