Rule-based evaluation scores on ViVerBench. We report per-task performance and the Overall score.
# | Model | Overall | CE-Obj | CE-Attr | CE-AbsP | OR-Spat | OR-NSpt | WD-SPhy | WD-DPhy | IA-BBox | IA-Point | IA-Count | SVE-Maze | SVE-FLake | SVE-Robot | SVE-GUI | STEM-Chart | STEM-LaTeX |
1 | Gemini 2.5 Pro 🥇 | 0.745 | 0.763 | 0.750 | 0.856 | 0.875 | 0.761 | 0.746 | 0.532 | 0.875 | 0.863 | 0.698 | 0.580 | 0.804 | 0.563 | 0.912 | 0.540 | 0.799 |
2 | GPT-5 🥈 | 0.744 | 0.696 | 0.737 | 0.849 | 0.725 | 0.746 | 0.775 | 0.668 | 0.831 | 0.885 | 0.659 | 0.507 | 0.743 | 0.589 | 0.856 | 0.760 | 0.876 |
3 | OpenAI o3 🥉 | 0.735 | 0.723 | 0.728 | 0.801 | 0.713 | 0.754 | 0.729 | 0.682 | 0.802 | 0.885 | 0.643 | 0.517 | 0.671 | 0.627 | 0.875 | 0.732 | 0.887 |
4 | Seed 1.5-VL | 0.731 | 0.737 | 0.763 | 0.651 | 0.779 | 0.851 | 0.588 | 0.575 | 0.903 | 0.870 | 0.610 | 0.527 | 0.718 | 0.671 | 0.833 | 0.720 | 0.907 |
5 | OpenAI o4-mini | 0.727 | 0.745 | 0.746 | 0.781 | 0.763 | 0.754 | 0.646 | 0.654 | 0.843 | 0.819 | 0.604 | 0.560 | 0.650 | 0.658 | 0.833 | 0.700 | 0.876 |
6 | OpenAI o1 | 0.715 | 0.647 | 0.754 | 0.760 | 0.704 | 0.769 | 0.675 | 0.671 | 0.758 | 0.826 | 0.626 | 0.587 | 0.646 | 0.601 | 0.764 | 0.728 | 0.902 |
7 | InternVL3.5 A28B | 0.671 | 0.688 | 0.737 | 0.637 | 0.742 | 0.799 | 0.592 | 0.500 | 0.847 | 0.796 | 0.527 | 0.503 | 0.539 | 0.519 | 0.796 | 0.640 | 0.881 |
8 | Qwen 2.5-VL 72B | 0.661 | 0.696 | 0.642 | 0.678 | 0.550 | 0.813 | 0.600 | 0.507 | 0.839 | 0.744 | 0.615 | 0.517 | 0.507 | 0.513 | 0.796 | 0.628 | 0.922 |
9 | OmniVerifier 7B (Ours) | 0.653 | 0.728 | 0.711 | 0.514 | 0.742 | 0.679 | 0.517 | 0.618 | 0.802 | 0.670 | 0.566 | 0.563 | 0.482 | 0.728 | 0.662 | 0.548 | 0.912 |
10 | GPT-4o | 0.645 | 0.540 | 0.608 | 0.671 | 0.538 | 0.731 | 0.713 | 0.500 | 0.649 | 0.744 | 0.632 | 0.570 | 0.643 | 0.563 | 0.796 | 0.656 | 0.758 |
11 | Qwen 2.5-VL 7B | 0.570 | 0.531 | 0.591 | 0.500 | 0.504 | 0.694 | 0.529 | 0.471 | 0.673 | 0.633 | 0.467 | 0.527 | 0.404 | 0.671 | 0.625 | 0.556 | 0.742 |
* | Human | 0.932 | 0.938 | 0.940 | 0.932 | 0.988 | 0.955 | 0.929 | 0.818 | 0.961 | 0.966 | 0.918 | 0.997 | 1.000 | 1.000 | 0.935 | 0.928 | 0.706 |
* | Random | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 |
🚨 To submit your results to the leaderboard, please send to this email with your result json files.