Exploring a Collaborative Gamified Approach to Vision-Language Model Evaluation

概要

Vision-language models demonstrate impressive capabilities, yet constructing evaluation datasets that capture their failure cases still relies heavily on tedious manual annotation. We present a gamified, collaborative platform in which pairs of participants naturally uncover AI weaknesses through engaging gameplay. Players describe and identify images drawn from visually similar sets, while a vision-language model performs the same guessing tasks in parallel, allowing direct comparison between human and AI performance. In a preliminary study with 68 participants across 137 rounds, the game format sustained high enjoyment while producing effective data. Overall, our work shows how gamified interaction can shift AI evaluation from static, labor-intensive dataset construction to dynamic human–AI comparison, and offers design implications for crowdsourcing and evaluation platforms.

収録
Proceedings of the Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems