How strong is your perception model? Can it track objects and points even through strong occlusions? Can it localise actions and sounds? Can it answer questions that require memory and understanding of physics, abstraction, and semantics? Can it reason over hour-long videos?
Put your model to test and win prizes totalling 50K EUR across 6 tracks!
NEW this year: VQA is a unified track containing regular video QAs, but also questions related to point tracking, object tracking, action localisation in a video QA format.
NEW this year: Points&Objects is a unified track where a single model has to track both points and objects.
NEW this year: Actions&Sounds is a unified track where a single model has to localise in time both actions and sounds.
NEW this year: Perception Test interpretability track (open for submission until Dec 1, 2025).
NEW this year: We have 2 guest tracks: KiVA (image evaluation probing visual analogies skills), and Physics-IQ (assessing if generative models generate physics-aware videos).
University of Washington
UC Berkeley
University of Texas at Austin
MIT
We received 557 submissions from 81 teams across five tracks. We awarded runner-up and winner prizes per track. For tracks where the top performing entries were very close in performance, but had very different approaches, we awarded 2 teams as winners.