// measuring visual intelligence_

Physics-IQ Verified

GitHub repoFull paper

Verified Ranking

Model results

1
39.5%±0.8
i2vYes2026-06-18
2
34.8%±0.6
i2vLikely2026-06-17
3
33.4%±0.8
i2vNo2026-06-17
4
32.2%±0.6
i2vNo2026-06-17
5
30.3%±0.6
i2vYes2026-06-18
6
26.5%±0.8
i2vLikely2026-06-17
7
25.3%±1.8
i2vNo2026-06-17

Metric Breakdown

Submetric leaders

Spatial

  1. Cosmos3-Super-Image2Video53.6±1.4
  2. Grok Imagine Video52.7±0.9
  3. Wan 2.251.1±1.0
  4. Hunyuan Video 1.547.1±1.2
  5. Cosmos3-Nano44.5±0.7

Spatiotemporal

  1. Cosmos3-Super-Image2Video30.0±1.8
  2. Sora 227.0±2.2
  3. Hunyuan Video 1.526.9±1.0
  4. Grok Imagine Video21.4±0.6
  5. Cosmos3-Nano20.9±0.9

Weighted Spatial

  1. Cosmos3-Super-Image2Video38.6±1.4
  2. Grok Imagine Video35.7±1.0
  3. Hunyuan Video 1.529.7±0.6
  4. Cosmos3-Nano29.0±0.8
  5. Wan 2.228.5±0.7

MSE

  1. Cosmos3-Super-Image2Video35.9±0.6
  2. Hunyuan Video 1.530.0±1.0
  3. Grok Imagine Video29.6±0.4
  4. Wan 2.228.9±0.4
  5. Cosmos3-Nano26.8±0.8

Cost Frontier

Score vs Cost ($)

* Price via leading API providers or estimated via GPU market rate, May 2026. Effective cost normalizes to 24 FPS and 1280-wide output, with separate LLM prompt overhead where used. n.d. denotes values not publicly disclosed by the model provider. GPU implementations were done to the best of our knowledge and as close as possible to the recommended setup.