Using ARC Puzzles to Help Children Identify Reasoning Errors in Generative AI

Photo by RDNE Stock project
The integration of generative Artificial Intelligence (genAI) into everyday life raises questions about the competencies required to critically engage with these technologies. Unlike visual errors in genAI, textual mistakes are often harder to detect and require specific domain knowledge. Furthermore, AI’s authoritative tone and structured responses can create an illusion of correctness, leading to overtrust, especially among children. To address this challenge, we developed AI Puzzlers, an interactive system designed to help children (ages 6+) critically engage with genAI’s reasoning by solving visual puzzles. Accessible through any web browser, AI Puzzlers requires no prior knowledge of AI or programming, making it an easy entry point for young learners. The system works by presenting puzzles that children can easily solve, alongside AI-generated solutions. By visually comparing genAI’s solution with their own, children can identify when genAI makes mistakes, preventing them from being misled by polished yet incorrect answers. Additionally, similar to debugging in programming, children can identify mistakes in genAI’s approach and suggest corrections by typing their suggestions into a “Hint” field. Through this interaction, children begin to recognize both the limitations of genAI and the unique strengths of human reasoning.
Our findings show that AI Puzzlers provided a tangible way for children to engage with genAI’s reasoning by making its outputs visually comparable to their own. Even younger children, who were not yet fluent readers, quickly detected inconsistencies in AI-generated solutions by evaluating their visual outputs. When genAI made mistakes — especially on puzzles they considered easy — children reacted with surprise and amusement, sparking meaningful dialogue around how “AI thinks”. This also helped them recognize that genAI approaches problem-solving differently from humans and, despite its strengths, has limitations that require careful evaluation of its outputs. Their continued engagement with AI Puzzlers highlights the importance of designing genAI systems that present information in ways that facilitate comparison, encourage reflection, and scaffold multiple ways of understanding. This way, children are more likely to persist and critically evaluate AI outputs.
Sample Publications:
Dangol et al. Under Review. “AI just keeps guessing”: Using ARC Puzzles to Help Children Identify Reasoning Errors in Generative AI
Dangol et al. Under Review. Children’s Mental Models of AI Reasoning: Implications for AI Literacy Education
Project Team

Aayushi Dangol
Graduate Student Researcher

Robert Wolfe
Graduate Student Researcher

Runhua Zhao
Graduate Student Researcher

Jaewon Kim
Graduate Student Researcher

Trushaa Ramanan
Undergraduate Student Researcher

Jason Yip
Associate Professor

Julie Kientz
Professor