Generative Universal Verifier as Multimodal Meta-Reasoner

Introduction

We introduce Generative Universal Verifier, a novel concept and plugin designed for next-generation multimodal reasoning in vision-language models and unified multimodal models, providing the fundamental capability of reflection and refinement on visual outcomes during the reasoning and generation process. This work makes three main contributions:

(1) We build ViVerBench, a comprehensive benchmark spanning 16 categories of critical tasks for evaluating visual outcomes in multimodal reasoning. Results show that existing VLMs consistently underperform across these tasks, underscoring a substantial gap from human-level capability in reliable visual verification.

(2) We design two automated pipelines to construct large-scale visual verification data and train OmniVerifier-7B, the first omni-capable generative verifier trained for universal visual verification and achieves notable gains on ViVerBench(+8.3). Through training, we identify three atomic capabilities in visual verification and demonstrate how they generalize and interact synergistically.

(3) We propose OmniVerifier-TTS, a sequential test-time scaling paradigm that leverages the universal verifier to bridge image generation and editing within unified models, enhancing the upper bound of generative ability through iterative fine-grained optimization. Beyond generation, we extend universal verifier to broader world-modeling interleaved reasoning scenarios. Empirically, OmniVerifier-TTS achieves improvements on T2I-ReasonBench(+3.7), and GenEval++(+4.3), outperforming existing parallel test-time scaling methods, such as Best-of-N.

By endowing multimodal reasoning with reliable visual verification, OmniVerifier advances both reliable reflection during generation and scalable test-time refinement, marking a step toward more trustworthy and controllable next-generation reasoning systems.

Leaderboard on ViVerBench

Rule-based evaluation scores on ViVerBench. We report per-task performance and the Overall score.

#	Model	Overall	CE-Obj	CE-Attr	CE-AbsP	OR-Spat	OR-NSpt	WD-SPhy	WD-DPhy	IA-BBox	IA-Point	IA-Count	SVE-Maze	SVE-FLake	SVE-Robot	SVE-GUI	STEM-Chart	STEM-LaTeX
1	Gemini 2.5 Pro 🥇	0.745	0.763	0.750	0.856	0.875	0.761	0.746	0.532	0.875	0.863	0.698	0.580	0.804	0.563	0.912	0.540	0.799
2	GPT-5 🥈	0.744	0.696	0.737	0.849	0.725	0.746	0.775	0.668	0.831	0.885	0.659	0.507	0.743	0.589	0.856	0.760	0.876
3	OpenAI o3 🥉	0.735	0.723	0.728	0.801	0.713	0.754	0.729	0.682	0.802	0.885	0.643	0.517	0.671	0.627	0.875	0.732	0.887
4	Seed 1.5-VL	0.731	0.737	0.763	0.651	0.779	0.851	0.588	0.575	0.903	0.870	0.610	0.527	0.718	0.671	0.833	0.720	0.907
5	OpenAI o4-mini	0.727	0.745	0.746	0.781	0.763	0.754	0.646	0.654	0.843	0.819	0.604	0.560	0.650	0.658	0.833	0.700	0.876
6	OpenAI o1	0.715	0.647	0.754	0.760	0.704	0.769	0.675	0.671	0.758	0.826	0.626	0.587	0.646	0.601	0.764	0.728	0.902
7	InternVL3.5 A28B	0.671	0.688	0.737	0.637	0.742	0.799	0.592	0.500	0.847	0.796	0.527	0.503	0.539	0.519	0.796	0.640	0.881
8	Qwen 2.5-VL 72B	0.661	0.696	0.642	0.678	0.550	0.813	0.600	0.507	0.839	0.744	0.615	0.517	0.507	0.513	0.796	0.628	0.922
9	OmniVerifier 7B (Ours)	0.653	0.728	0.711	0.514	0.742	0.679	0.517	0.618	0.802	0.670	0.566	0.563	0.482	0.728	0.662	0.548	0.912
10	GPT-4o	0.645	0.540	0.608	0.671	0.538	0.731	0.713	0.500	0.649	0.744	0.632	0.570	0.643	0.563	0.796	0.656	0.758
11	Qwen 2.5-VL 7B	0.570	0.531	0.591	0.500	0.504	0.694	0.529	0.471	0.673	0.633	0.467	0.527	0.404	0.671	0.625	0.556	0.742
*	Human	0.932	0.938	0.940	0.932	0.988	0.955	0.929	0.818	0.961	0.966	0.918	0.997	1.000	1.000	0.935	0.928	0.706
*	Random	0.500	0.500	0.500	0.500	0.500	0.500	0.500	0.500	0.500	0.500	0.500	0.500	0.500	0.500	0.500	0.500	0.500

🚨 To submit your results to the leaderboard, please send to this email with your result json files.

BibTeX

@article{zhang2025generative,
  author  = {Zhang, Xinchen and Zhang, Xiaoying and Wu, Youbin and Cao, Yanbin and Zhang, Renrui and Chu, Ruihang and Yang, Ling and Yang, Yujiu},
  title   = {Generative Universal Verifier as Multimodal Meta-Reasoner},
  journal = {arXiv preprint arXiv:2510.13804},
  year    = {2025}
}